CSC/ECE 517 Fall 2010/ch6 6g ss
Document Object Model
Introduction
Browser Scripting Languages are widely used today, one example of which is JavaScript.. It allows the user to interact with the web page by accessing the component parts of a web page. The technique used to access these components is called Document Object Model (DOM). It creates objects corresponding to the components of the web page which scripting languages interact with to manipulate web pages.
What is DOM
Information is presented in XML and HTML documents is in a structured format. The technique used to access these document si called Document Object Model (DOM). Dom is platform language independent interface that allows to create and modify the content,structure and style of documents. These changes can be incorporated in the document again. DOM is based on the O-O concepts :
Objects are the main part while representing these objects. We can say that Objects are encapsulation of a set of methods and interfaces. Methods are used to access or change the object’s state and interfaces refer to declaration of a set of methods
What DOM isn't
1) It does not describe how to persist objects into XML or HTML, but it represents documents as a collection of objects.
2) DOM does not describt any internal representation of object or data. It is not a specific set of data structures. For example, even though we have parent-child relationship between the nodes they are logocal relationships of objects.
3) In addition, DOM does not describe what information in a document is relevant or how that information in structured. The structuring of the document is stored by the XML Infoset. DOM is just an API for to access this information.
History
- Before DOM became an official W3C specification, the web community the web community had started to develop ways of creating page-based script programs. This early work led to development of Dynamic HTML <refer this>, which was primarily used for richer user interfaces than plain HTML.
- LiveScript: At the time, Netscape Navigator was a very popular browser, and also was the first to bring out a programming language that would allow web pages to become interactive. The language, called LiveScript <refer> was designed at Netscape Communications and came integrated into the Netscape Navigator browser.
- JavaScript: In December 1995, LiveScript was renamed JavaScript and released as part of Netscape Navigator 2.0.
- Jscript & vbscript : Internet Explorer was updated to support two integrated languages, vbscript and Jscript. Jscript was very similar to JavaScript. In July 1996, Microsoft released Internet Explorer 3.0 supporting these languages.
- ECMAScript: Netscape delivered JavaScript to Ecma International for standardization and the work on the specification, ECMA-262, began in November 1996.
- In June 1997, ECMA adopted a hybrid version of the scripting languages called ECMAScript. However, ECMAScript arrived late for the 4.0 releases of Netscape Navigator and Internet Explorer. Each introduced their own document object model, DHTML and dHTML, that came to be called Dynamic HTML.
- DOM: At the beginning of 1997, the companies involved in this consortium (including Netscape Communications and Microsoft) decided to find a consensus around their object models to access and manipulate documents. While trying to stay as backward compatible as possible with the original browser object models, the W3C's Document Object Model (DOM) provided a better object representation of HTML documents.
- DOM on Client: <textbook> The Internet Explorer 6 and Netscape Navigator 6 browsers have achieved complete support for DOM Level 1. Other browser vendors followed them.
- DOM on Server: Soon did the companies realized that the benefits of DOM can be applied on server side as well as client side. DOM implementations began appearing for server-side products as well. One well-known is the Xerces <http://xerces.apache.org/xerces-c/>parser from Apache Foundation.
- HTML and XML: In 1996, a new markup language, the Extensible Markup Language (XML), was developed in the W3C as well. Meant to remove the HTML language's extensibility restrictions, the idea of developing an object model for XML quickly became another goal of the DOM effort.
- In June 2004, Ecma International published ECMA-357 standard, defining an extension to ECMAScript, known as E4X (ECMAScript for XML).
Levels of DOM
Dom has 3 levels each with its own specifications. Generally, developers prefer to use DOM Level2 because of its wide support by Internet Explorer (IE) and Firefox. Here is a brief description of each of these levels.
Level 1
This was the first version of standards published in 1998 and later updated in 2000
Consists of two major parts .
DOM Core : Deals with the internal structure of the nodes of the tree. Also provides, properties and methods to create, edit and manipulate the tree.
DOM HTML : Objects, properties and methods for specific elements associated with HTML documents and tags.
This Level allowed basic document operations, such as creating
and deleting nodes, working with their contents, and traversing the document tree.
Rest of the operations such as loading and saving documents, etc were left to the developer
community. So a lot of this code was implementation dependent.
Level 2
This level of DOM provides additional interfaces to the existing set of functionality of DOM1. It was first proposed in 2000 and the HTML specification was updated in 2003. It has six different specifications
DOM2Core : Adds name-space specific methods to DOM Core and gives control of structure of DOM elements
DOM2 HTML : Similar to DOM HTML
DOM2 Events : Control of mouse related events such as targeting, capturing,canceling. This does not handle keyboard-related events
DOM2 STYLE : Gives access to CSS related styling and rules. Also known as DOM2 CSS
DOM2 Traversal and Range : Iterative access to DOM so that navigating and manipulating the document is easy.
DOM2 VIEWS : Ability to access and update a representation of a document.
This level added support for more types of document operations and content, such as events and styles. As result of this, developers had standard ways to handle user events and also work with Cascading Style Sheets (CSS) information in the documents.
Level 3
This level was drafted in 2003 and 2004. It consists of five different specifications
DOM3 Core : Adds more methods and changes to existing core
DOM3 Load and Save : Two major functionalities provided of this include 1) content of an XML document can be loaded into your DOM document 2) serialize your DOM document into an XML document
DOM3 Validation : check id dynamic document is valid and conforms to the DOCTYPE.
DOM3 Events : handles keyboard and mouse related events.
DOM3 XPATH : Using XPath features to traverse the document
This level focused on the functionality left out in Level 2 such as Load and Save, etc. Also, support for XPath provides easier navigation through the document. DOM Level 3 introduced two new data types 1) DOMObject : Represents an arbitrary memory object 2) DOMUserData : Represents an interface to an arbitrary block of data in a DOM application.
It includes abstract schema modules which support Document Type Definition (DTD) files, XML Schemas, and other nonspecific schema types. Another feature of Level3 DOM is error handling interfaces.
DOM represents a document using a tree structure. This tree can be thought as a collection of individual sub trees. Following example shows a simple HTML page and corresponding DOM.
The <BODY> branch can be thought as its own tree within the larger document tree. Each box in Figure 1 is called a node in DOM terminology. The DOM API provides several interfaces in the Core module for manipulating nodes and inserting/extracting information to and from nodes. There are also methods present for determining and changing the relationships between nodes and discovering the type of a particular mode. Lets first have a look at different object types supported by DOM.
Core Node Object
A node is an object representation of a particular element in the document's context. Nodes have special names, depending on where they are located in the document tree and what their position is relative to other nodes.
All documents have a root node.
Each element in Figure 1 extends from Node object. There are few properties of Node object which are available to all DOM objects are extended from Node object.
Types of Node Objects
Node Type | Description | nodeName | nodeValue | attributes |
---|---|---|---|---|
Element | Represents an element in an HTML or XML document, which is represented by tags. In Figure 1, <BODY> node is an element type | Tag name | null | NamedNodeMap |
Attribute | Represents an element's attribute. A tag with the syntax <img src = "picture.jpg"> has an Attribute node src. | Attribute name | Attribute value | null |
Text | Represents the textual content of an element. In Figure 1, the node has text child node which represents the text "Bold Text" | #text | Comment text | null |
CDATA Selection | Represents a Character Data selection in an XML document | #cdata-section | CDATASection content | null |
Comment | Represents a comment. Comments are of the form | #comment | Comment text | null |
Document | Represents the root node of a document | #document | null | null |
DocumentType | Each Document node has a DocumentType node that provides a list of entities defined in the document | Name of document type | null | null |
DocumentFragment | They are lightweight or minimal Document nodes. They help when we want to extract just a portion of a document for processing | #document-fragment | null | null |
ProcessingInstruction | Represents instructions to be used by the document processor | Target name | Content excluding the target | null |
Entity | Represents an entity in XML document | Entity name | null | null |
EntityReference | Represents an entity reference in the document | Name of referenced entity | null | null |
Notation | Represents a notation in an XML document. Notations have no parent nodes | Notation name | null | null |
- Node name, values, types - Node parents, children, siblings To easily navigate the document tree, each Node object has number of predefined properties that reference various parts of the tree. Each of these properties references an actual DOM object, with exception of childNOdes which references a NodeList of DOM objects (like an array). parentNode: references a single direct parent of the specific node. childNodes: references all child elements of a node. firstChild: references first child in child nodes. lastChild: references last child in child nodes. previousSiblings: Represents sibling node immediately before the selected node. nextSiblings: Represents sibling node immediately after the selected node.
- Node Attributes <https://developer.mozilla.org/En/DOM/Node.attributes> Just like rest of DOM document, attributes are also based on the Node object, but they are not part of the general parent/child relationship tree. Attributes are instances of Attr pbject are contained in a NamedNodeMap in node's attributes property. Accessing Node Attributes:
x=element.attributeName if attributeName is a W3C defined attribute and an Attribute Node for the element, (eg id) x gets assigned the value of that Attribute Node if attributeName isn't a W3C defined attribute or an attribute node for the element, x gets assigned the value of the attributeName property for element (JavaScript object)
x=element.attributes.attributeName.value or x= element.attributes['attributeName'].value if attributeName is an Attribute Node for the element, x gets assigned that node's value. If attributeName isn't an Attribute Node, produce EXCEPTION
x = element.attributes[indexNumber].value if attributeName is an Attribute Node for the element, x gets assigned that node's value. If attributeName isn't an Attribute Node, produce EXCEPTION
x= element.getAttribute('attributeName') if attributeName is an Attribute Node for the element, x gets assigned its value. If AttributeName isn't an Attribute Node for the element, x gets assigned the null value
- Node OwnerDocument Property This property can be used to reference the root document to which a node belongs.
Core Element Object
Elements inherit from Node object. All element objects have properties and methods of the Node object as well as a few others that facilitate manipulating the attributes of node and locating child element objects.
Methods for manipulating 'attributes' property of base node: getAttribute(name) allows to retrieve an attribute based on the name of the attribute as a string. setAttribute(name, value) allows you to set the value of an attribute based on the name of the attribute as a string. removeAttribute(name) allows to remove the value of an attribute based on the name of the attribute as a string.
Methods for manipulating attributes bsed on actual DOM Attr node object: getAttributeNode(name): Allows to retrieve the Attr node of the specified attribute. setAttributeNode(newAttr): Allows to set attributes based on new instances of the Attr object removeAttributeNode(oldAttr): Allows to remove the attribute node the same way one can remove a child node using removeChild() method.
Locating Element objects within Element objects: The getElementByTagName() method returns the NodeList object referencing all ancestors with the given tag name.The resulting list is the list of all the elements in the order they appear in DOM Document if read left to right and top to bottom.
Core Document Object
The Document interface represents the entire document. IT serves as the primary gateway to accessing the document's data. It represents root of the document tree. It also contains methods necessary for creating new document objects such as elements, sattributes, Text nodes, comments.
The document.documentElement property It is a shortcut to the root element fo the document.
Creating nodes with document methods createAttribute(name): Creates Attr nodes of type Node.ATTRIBUTE_NODE createComment(data): Crates Comment nodes of type Node.COMMENT_NODE crateElement(tagName): Creates element nodes of type Node.ELEMENT_NODE createTextNode(date): Creates Text nodes of type Node.TEXT_NODE
Locating Elements with Document methods: The getElementById('ID') method returns one element corresponding to the ID. This method is singular and only returns one element.
The getElementsByTagName('TAG') method returns a NodeList containing all the elements in the document with the same tag name as TAG. The NodeList is ordered by the order in which the Elements were encountered in a preorder traversal of the document tree. If TAG is *, all tags are matched.
The importNode() method imports a node into this document from another document. The returned node has no parent node. A copy of source node is created, thus the source document is not affected in any way. The Documents and DocumentTypes nodes cannot be imported.
Example DOM
Algorithms
There are certain types of document processing algorithms that you can use to traverse DOM nodes. These algorithms help to
- Determine whether a node is contained within another node
- Determine whether a node has a sibling of a certain type
- Finding a node based on the value of an attribute
- Processing the children of a node
There are two main types of DOM algorithms: position based and content based.
Position based
Position based DOM algorithms are best when you are more interested in position of nodes within a document than concerned with contents of nodes. For example, to determine whether a node occupies particular place in DOM tree is an application o f position based algorithm.
These algorithms work by examining the locations of nodes within a given DOM document. The contents of the node are secondary (if considered at all). Two common position-based algorithms are
- Determining whether a node has an ancestor node of a particular type
- Determining whether a node has a sibling of particular type.
Content-Based Algorithms
Content-based algorithms focus on the actual content of the nodes and their attributes. Common content-based algorithms are determining whether a given node contains another particular node, retrieving nodes based upon their type and finding a node with a particular attribute value.
Browser Support
DOM plays an important role in dispalying the same chunk ofinfpormation across multiple browsers. AS a result of this the user is able to see the same information in Internet Explorer, Firefox, Opera, etc. In addition to the standard W3C impementation of DOM, each browser has some proprietary methods and properties. This is the primary reason why the browsers differ in their functionality and implementation. Internet Explorer supports ActiveXObject for handling XML Http requests while other browsers support the MxlHttpRequest object. Another example of the differences might be that some versions prior ti IE6 do not support createAttribute(), getAttributeNode(), or setAttributeNode() methods. An alternative to these is to use the getAttribute() and setAttribute() methods. NetScape Navigator stores newline charachters as individual text nodes. The developer has to take into account these changes while target the web page across multiple browser platforms.
Applications
Future DOM
XSLT Support has been provided in the latest version of DOM drafted. the initial draft of DOM level 4 was finalized in 2009. DOM4 build on DOM Core Level 3 and also incorporates features from DOM Leve2 HTML.
Conclusion
DOM is changing the way we browse web pages everyday. As newer version of DOM are drafted to support the modern web standards, it provides better interactivity with web pages. DOM4 promises for developing richer, standardized applications. Also with the advent of new DOM features, the browsers will also see a change in the implementation as well as operation giving the end users a better browsing experience.
Suggested Reading
1) W3C Document Object - http://www.w3.org/DOM/