xml dom and sax parsers
DESCRIPTION
XML DOM and SAX Parsers. By Omar RABI. Introduction to parsers. The word parser comes from compilers In a compiler, a parser is the module that reads and interprets the programming language. . Introduction to Parsers. - PowerPoint PPT PresentationTRANSCRIPT
XMLXMLDOM and SAXDOM and SAX
ParsersParsers
By By Omar RABIOmar RABI
Introduction to parsersIntroduction to parsers
The word The word parser parser comes from comes from compilers compilers
In a compiler, a parser is the module In a compiler, a parser is the module that reads and interprets the that reads and interprets the programming language. programming language.
Introduction to ParsersIntroduction to Parsers
In XML, a In XML, a parser is a parser is a software software component component that sits that sits between the between the application application and the XML and the XML files.files.
Introduction to parsersIntroduction to parsers
It reads a text-formatted XML file or It reads a text-formatted XML file or stream and converts it to a stream and converts it to a document to be manipulated by the document to be manipulated by the application.application.
Well-formedness and validityWell-formedness and validity
Well-formed documents respect the Well-formed documents respect the syntactic rules. syntactic rules.
Valid documents not only respect the Valid documents not only respect the syntactic rules but also conform to a syntactic rules but also conform to a structure as described in a DTD. structure as described in a DTD.
Validating vs. Non-validating Validating vs. Non-validating parsersparsers
Both parsers enforce syntactic rules Both parsers enforce syntactic rules
only validating parsers know how to only validating parsers know how to validate documents against their validate documents against their DTDs DTDs
Tree-based parsers Tree-based parsers
These map an XML document into an These map an XML document into an internal tree structure, and then internal tree structure, and then allow an application to navigate that allow an application to navigate that tree.tree.
Ideal for browsers, editors, XSL Ideal for browsers, editors, XSL processors.processors.
Event-based Event-based
An An event-based API reports parsing reports parsing events (such as the start and end of events (such as the start and end of elements) directly to the application elements) directly to the application through callbacks.through callbacks.
The application implements handlers The application implements handlers to deal with the different events to deal with the different events
Event-based vs. Tree-based Event-based vs. Tree-based parsersparsers
Tree-based parsers deal generally Tree-based parsers deal generally small documents.small documents.
Event-based parsers deal generally Event-based parsers deal generally used for large documents.used for large documents.
Event-based vs. Tree-based Event-based vs. Tree-based parsersparsers
Tree-based parsers are generally Tree-based parsers are generally easier to implement.easier to implement.
Event-based parsers are more Event-based parsers are more complex and give hard time for the complex and give hard time for the programmerprogrammer
What is DOM?What is DOM?
The Document Object Model (DOM) is The Document Object Model (DOM) is an application programming an application programming interface (API) for HTML and XML interface (API) for HTML and XML documents. documents.
It defines the logical structure of It defines the logical structure of documents and the way a document documents and the way a document is accessed and manipulatedis accessed and manipulated
Properties of DOMProperties of DOM Programmers can build documents, Programmers can build documents,
navigate their structure, and add, modify, navigate their structure, and add, modify, or delete elements and content. or delete elements and content.
Provides a standard programming Provides a standard programming interface that can be used in a wide interface that can be used in a wide variety of environments and applications.variety of environments and applications.
structural isomorphism.structural isomorphism.
DOM IdentifiesDOM Identifies
The interfaces and objects used to The interfaces and objects used to represent and manipulate a document.represent and manipulate a document.
The semantics of these interfaces and The semantics of these interfaces and objects - including both behavior and objects - including both behavior and attributes.attributes.
The relationships and collaborations The relationships and collaborations among these interfaces and objects.among these interfaces and objects.
What DOM is not!!What DOM is not!!
The Document Object Model is not a The Document Object Model is not a binary specificationbinary specification..
The Document Object Model is not a way The Document Object Model is not a way of persisting objects to XML or HTML. of persisting objects to XML or HTML.
The Document Object Model does not The Document Object Model does not define "the true inner semantics" of XML define "the true inner semantics" of XML or HTML.or HTML.
What DOM is not!!What DOM is not!!
The Document Object Model is not a The Document Object Model is not a set of data structures, it is an object set of data structures, it is an object model that specifies interfaces. model that specifies interfaces.
The Document Object Model is not a The Document Object Model is not a competitor to the Component Object competitor to the Component Object Model (COM).Model (COM).
DOM into workDOM into work
<?xml version="1.0"?><?xml version="1.0"?><products><products>
<product><product><name>XML Editor</name><name>XML Editor</name><price>499.00</price><price>499.00</price>
</product></product><product><product>
<name>DTD Editor</name><name>DTD Editor</name><price>199.00</price><price>199.00</price>
</product></product><product><product>
<name>XML Book</name><name>XML Book</name><price>19.99</price><price>19.99</price>
</product></product><product><product>
<name>XML Training</name><name>XML Training</name><price>699.00</price><price>699.00</price>
</product></product></products></products>
DOM into workDOM into work
DOM levels: level 0DOM levels: level 0
DOM Level 0 is a mix of Netscape DOM Level 0 is a mix of Netscape Navigator 3.0 and MS Internet Navigator 3.0 and MS Internet Explorer 3.0 document Explorer 3.0 document functionalities. functionalities.
DOM levels: DOM 1DOM levels: DOM 1
It contains functionality for document It contains functionality for document navigation and manipulation.navigation and manipulation.
i.e.: functions for creating, deleting i.e.: functions for creating, deleting and changing elements and their and changing elements and their attributes. attributes.
DOM level 1 limitationsDOM level 1 limitations A structure model for the internal A structure model for the internal
subset and the external subset. subset and the external subset. Validation against a schema. Validation against a schema. Control for rendering documents via Control for rendering documents via
style sheets. style sheets. Access control. Access control. Thread-safety. Thread-safety. EventsEvents
DOM levels: DOM 2DOM levels: DOM 2 A style sheet object model and A style sheet object model and
defines functionality for manipulating defines functionality for manipulating the style information attached to a the style information attached to a document.document.
Enables of the traversal on the Enables of the traversal on the document.document.
Defines an event model.Defines an event model. Provides support for XML Provides support for XML
namespaces namespaces
DOM levels: DOM 3DOM levels: DOM 3 Document loading and saving as well Document loading and saving as well
as content models (such as DTD’s as content models (such as DTD’s and schemas) with document and schemas) with document validation support. validation support.
Document views and formatting, key Document views and formatting, key events and event groups events and event groups
An Application of DOMAn Application of DOM
<HTML><HTML><HEAD><HEAD><TITLE>Currency Conversion</TITLE><TITLE>Currency Conversion</TITLE><SCRIPT LANGUAGE="JavaScript" SRC="conversion.js"></SCRIPT><SCRIPT LANGUAGE="JavaScript" SRC="conversion.js"></SCRIPT></HEAD></HEAD><BODY><BODY><CENTER><CENTER><FORM ID="controls"><FORM ID="controls">File: <INPUT TYPE="TEXT" NAME="fname" VALUE="prices.xml">File: <INPUT TYPE="TEXT" NAME="fname" VALUE="prices.xml">Rate: <INPUT TYPE="TEXT" NAME="rate" VALUE="0.95274" SIZE="4"><BR>Rate: <INPUT TYPE="TEXT" NAME="rate" VALUE="0.95274" SIZE="4"><BR><INPUT TYPE="BUTTON" VALUE="Convert" ONCLICK="convert(controls,xml)"><INPUT TYPE="BUTTON" VALUE="Convert" ONCLICK="convert(controls,xml)"><INPUT TYPE="BUTTON" VALUE="Clear" ONCLICK="output.value=''"><BR><INPUT TYPE="BUTTON" VALUE="Clear" ONCLICK="output.value=''"><BR><TEXTAREA NAME="output" ROWS="10" COLS="50" READONLY> </TEXTAREA><TEXTAREA NAME="output" ROWS="10" COLS="50" READONLY> </TEXTAREA></FORM></FORM><xml id="xml"></xml><xml id="xml"></xml></CENTER></CENTER></BODY></BODY></HTML></HTML>
An Application of DOMAn Application of DOM <xml id="xml"></xml>:<xml id="xml"></xml>: defines an XML
island.
XML islands are mechanisms used to insert XML in HTML documents.
In this case, XML islands are used to access Internet Explorer’s XML parser. The price list is loaded into the island.
An Application of DOMAn Application of DOM The “Convert” button in the HTML file
calls the JavaScript function convert(), which is the conversion routine.
convert() accepts two parameters, the form and the XML island.
An Application for DOMAn Application for DOM<SCRIPT LANGUAGE="JavaScript" SRC="conversion.js"></SCRIPT><SCRIPT LANGUAGE="JavaScript" SRC="conversion.js"></SCRIPT>
function convert(form,xmldocument)function convert(form,xmldocument){var fname = form.fname.value,{var fname = form.fname.value,output = form.output,output = form.output,rate = form.rate.value;rate = form.rate.value;output.value = "";output.value = "";var document = parse(fname,xmldocument),var document = parse(fname,xmldocument),topLevel = document.documentElement;topLevel = document.documentElement;searchPrice(topLevel,output,rate);searchPrice(topLevel,output,rate);}}
function parse(uri,xmldocument)function parse(uri,xmldocument){xmldocument.async = false;{xmldocument.async = false;xmldocument.load(uri);xmldocument.load(uri);if(xmldocument.parseError.errorCode != if(xmldocument.parseError.errorCode != 0)0)alert(xmldocument.parseError.reason);alert(xmldocument.parseError.reason);return xmldocument;}return xmldocument;}
function searchPrice(node,output,rate)function searchPrice(node,output,rate){if(node.nodeType == 1){if(node.nodeType == 1){if(node.nodeName == "price"){if(node.nodeName == "price")output.value += (getText(node) * rate) + "\r";output.value += (getText(node) * rate) + "\r";var children,var children,i;i;children = node.childNodes;children = node.childNodes;for(i = 0;i < children.length;i++)for(i = 0;i < children.length;i++)searchPrice(children.item(i),output,rate);}}searchPrice(children.item(i),output,rate);}}
function getText(node)function getText(node){return node.firstChild.data;}{return node.firstChild.data;}
An Application of DOMAn Application of DOM nodeType is a code representing the type of the object.
parentNode is the parent (if any) of current Node object. childNode is the list of children for the current Node object.
firstChild is the Node’s first child. lastChild is the Node’s last child.
previousSibling is the Node immediately preceding the current one.
nextSibling is the Node immediately following the current one.
attributes is the list of attributes, if the current Node has any.
An Application of DOMAn Application of DOM
The parse() function loads the price list in the XML island and returns its Document object.
The function searchPrice() tests whether the current node is an element.
An Application of DOMAn Application of DOM
The function searchPrice() visits each node by recursively calling itself for all children of the current node.
An Application for DOMAn Application for DOM
What is SAX?What is SAX? SAX (the Simple API for XML) is an event-SAX (the Simple API for XML) is an event-
based parser for xml documents. based parser for xml documents.
The parser tells the application what is in The parser tells the application what is in the document by notifying the application the document by notifying the application of a stream of parsing events. of a stream of parsing events.
Application then processes those events to Application then processes those events to act on data. act on data.
SAX HistorySAX History
SAX 1.0 was released on May 11, 1998. SAX 1.0 was released on May 11, 1998.
SAX is a common, event-based API for SAX is a common, event-based API for parsing XML documents, developed as a parsing XML documents, developed as a collaborative project of the members of collaborative project of the members of the XML-DEV discussion under the the XML-DEV discussion under the leadership of David Megginson. leadership of David Megginson.
Why SAX?Why SAX?
For applications that are not so XML-For applications that are not so XML-centric, an object-based interface is centric, an object-based interface is less appealing. less appealing.
Efficiency: lower level than object-Efficiency: lower level than object-based interfaces based interfaces
Why SAX?Why SAX?
Event-based interface consumes Event-based interface consumes fewer resources than an object-fewer resources than an object-based one based one
With an event-based interface, the With an event-based interface, the application can start processing the application can start processing the document as the parser is reading it document as the parser is reading it
Limitations of SAXLimitations of SAX
With SAX, it is not possible to With SAX, it is not possible to navigate through the document as navigate through the document as you can with a DOM.you can with a DOM.
The application must explicitly buffer The application must explicitly buffer those events it is interested in. those events it is interested in.
SAX APISAX API
Parser events are similar to user-Parser events are similar to user-interface events such as ONCLICK (in interface events such as ONCLICK (in a browser) or AWT events (in Java).a browser) or AWT events (in Java).
Events alert the application that Events alert the application that something happened and the something happened and the application might want to react. application might want to react.
SAX APISAX API Element opening tagsElement opening tags
Element closing tagsElement closing tags
Content of elementsContent of elements
EntitiesEntities
Parsing errorsParsing errors
SAX APISAX API
SAX ExampleSAX Example
<?xml version="1.0"?><?xml version="1.0"?><doc><doc>
<para>Hello, world!</para><para>Hello, world!</para></doc> </doc>
SAX exampleSAX example
start documentstart document start element: docstart element: doc start element: parastart element: para characters: Hello, world!characters: Hello, world! end element: paraend element: para end element: docend element: doc end document end document
ConclusionConclusion