CSC/ECE 517 Fall 2010/ch6 6g SL: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 45: Line 45:
SAX is memory efficient since the application can discard the useless portions of the document and only keep the small portion that is of interests to the application. A SAX parser can achieve constant memory usage thus easily handle very large documents. Because of this, SAX is more suited for processing local information coming from nodes that are close to each other. Since SAX provides the document information to the application as a series of events, it is difficult for the application to handle global operations across the document. For such complex operations, the application would have to build its own data structure to store the document information.  
SAX is memory efficient since the application can discard the useless portions of the document and only keep the small portion that is of interests to the application. A SAX parser can achieve constant memory usage thus easily handle very large documents. Because of this, SAX is more suited for processing local information coming from nodes that are close to each other. Since SAX provides the document information to the application as a series of events, it is difficult for the application to handle global operations across the document. For such complex operations, the application would have to build its own data structure to store the document information.  


Comparison between DOM and JAXB
Comparison between DOM and SAX
{| class="wikitable" style="width: 75%;" border="1"
{| class="wikitable" style="width: 75%;" border="1"
|-
|-
Line 69: Line 69:




Comparison between DOM and SAX parser
Comparison between DOM and JAXB parser
{| class="wikitable" style="width: 75%;" border="1"
{| class="wikitable" style="width: 75%;" border="1"
|-
|-

Revision as of 22:38, 17 November 2010

Document Object Model

The Document Object Model is the language independent and platform independent interface which allows to interact with the document. Typically this interaction is with the objects of the HTML, XHTML and XML documents. DOM can change the content, style and structure of such documents. DOM API is provided in different languages to dynamically change XML and HTML documents.[1]

DOM structure

DOM is a programming API for documents. DOM builds tree for each document and then it traverses that tree to read different nodes of the tree. This tree is HTML/XML like structure where each node represents attribute, element, content or some other object. Each of these nodes implements node interface. Node interface has prototype of the methods which are useful to traverse the tree and read or modify the node contents. Below is the sample node tree structure.[2]


Different nodes in the above tree structure can be accessed using properties provided by Node interface. e.g. NodeA.firstChild = NodeA1 NodeA.childNodes[1] = NodeA2 NodeA.firstChild .firstChild = NodeA1a

The Node interface provides methods to dynamically change the tree structure or node content. e.g. insertBefore() method is used to insert new node before the specified node. The Document object model has document as the root. It also implements Node interface. Using methods such as getElementById(),getElementsByName() one can randomly access the elements in the tree where document is the root.[3]

DOM in HTML, XML, Javascript

Models of programming language objects

The term object model can be used in two different contexts. In one sense; it refers to a collection of concepts used to describe the generic characteristics of objects particularly object oriented languages or its specifications. Over here it closely corresponds to the word data model. Examples of this can be Java object model. This contrasts with the object model used to describe collection of object classes used to model a particular system. A common application of this object model can be defined as the Document Object Model

Any object model has three key concepts

  • data structures that can be used to represent the object state
  • ways to associate behaviour with the object state
  • ways for the object methods to access and operate on that state

The name "Document Object Model" was chosen because it is an "object model" in the traditional object oriented design sense: documents are modeled using objects, and the model encompasses not only the structure of a document, but also the behavior of a document and the objects of which it is composed. In other words, the nodes in the above diagram do not represent a data structure, they represent objects, which have functions and identity. As an object model, the DOM identifies:

  • the interfaces and objects used to represent and manipulate a document
  • the semantics of these interfaces and objects - including both behavior and attributes
  • the relationships and collaborations among the interfaces and objects

DOM vs Models of programming language objects

Other Competing Solutions

The Simple API for XML (SAX) is the event-driven, serial-access mechanism that does element-by-element processing. SAX allows you to process a document as it's being read, which avoids the need to wait for all of it to be stored before taking action. Being an event-based interface, the parser reports events whenever it sees a tag/attribute/text node/unresolved external entity/other. SAX is a streaming interface — applications receive information from XML documents in a continuous stream, with no backtracking or navigation allowed. This approach makes SAX extremely efficient, handing XML documents of nearly any size in linear time and near-constant memory, but it also places greater demands on the software developer's skills. As a result, the programmer has to attach “event handlers” to handle the events.

SAX is memory efficient since the application can discard the useless portions of the document and only keep the small portion that is of interests to the application. A SAX parser can achieve constant memory usage thus easily handle very large documents. Because of this, SAX is more suited for processing local information coming from nodes that are close to each other. Since SAX provides the document information to the application as a series of events, it is difficult for the application to handle global operations across the document. For such complex operations, the application would have to build its own data structure to store the document information.

Comparison between DOM and SAX

Document Object Model (DOM) Simple API for XML (SAX)
DOM is a tree based interface SAX is an event driven interface
Takes significant amount of memory More memory efficient
Convenient for random accessing Appropriate for addressing local information
Can not only read but also modify the document can only read XML Document
It has to wait before the entire document tree gets loaded in the memory before doing any operation It is good for streaming applications since the applications can start processing from the beginning


Comparison between DOM and JAXB parser

Document Object Model (DOM) Java Architecture for XML Binding (JAXB)
Transformation of DOM tree to XML Marshalling of Java Objects to XML
Transformation of XML Document to DOM tree Unmarshalling of XML data to Java Objects
It is not driven by a schema and transformation is done XML serialization process It is driven by a schema because since it uses data-binding to map between XML Documents and Java classes
In order to process xml data apllication focusses on knowing XML processing Application can focus on semantics of data rather than details of XML
You have to navigate through a tree to access data. Allows to access data in non sequential order but does not require to navigate through the tree
DOM is memory intensive since it has to hold the entire document tree in memory, making it incapable in handling very large documents. JAXB uses memory efficiently: The tree of content objects produced through JAXB tends can be more efficient in terms of memory use than DOM-based trees.
It is an approach for parsing an XML document Defines a binding between XML schema and corresponding object heirarchy

DOM vs SAX, JAXB, JDOM

DOM: Document Object Model An object-based interface Parser generates an in-memory tree corresponding to the document DOM interface defines methods for accessing and modifying the tree Advantages

  • Very useful for dynamic modification of, access to the tree
  • Useful for querying (I.e. looking for data) that depends on the tree structure [element.childNode("2").getAttributeValue("john")]
  • Same interface for many programming languages (C++, Java, ...)

Disadvantages

  • Can be slow (needs to produce the tree), and may need lots of memory
  • DOM programming interface is a bit awkward, not terribly object oriented

JDOM: Java Document Object Model A Java-specific object-oriented interface Parser generates an in-memory tree corresponding to the document JDOM interface has methods for accessing and modifying the tree Advantages

  • Very useful for dynamic modification of the tree
  • Useful for querying (I.e. looking for data) that depends on the tree structure
  • Much nicer Object Oriented programming interface than DOM

Disadvantages

  • Can be slow (make that tree...), and can take up lots of memory
  • New, and not entirely cooked (but close)
  • Only works with Java, and not (yet) part of Core Java standard

SAX: Simple API for XML An event-based interface Parser reports events whenever it sees a tag/attribute/text node/unresolved external entity/other Programmer attaches “event handlers” to handle the event Advantages

  • Simple to use
  • Very fast (not doing very much before you get the tags and data)
  • Low memory footprint (doesn’t read an XML document entirely into memory)

Disadvantages

  • Not doing very much for you -- you have to do everything yourself
  • Not useful if you have to dynamically modify the document once it’s in memory (since you’ll have to do all the work to put it in memory yourself!)

Conclusion

References

  1. Document Object Model Introduction.
  2. DOM Structure
  3. document root in DOM tree