xml dom and sax parsers

41
XML XML DOM and SAX DOM and SAX Parsers Parsers By By Omar RABI Omar RABI

Upload: amelie

Post on 10-Feb-2016

191 views

Category:

Documents


9 download

DESCRIPTION

XML DOM and SAX Parsers. By Omar RABI. Introduction to parsers. The word parser comes from compilers In a compiler, a parser is the module that reads and interprets the programming language. . Introduction to Parsers. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: XML DOM and SAX Parsers

XMLXMLDOM and SAXDOM and SAX

ParsersParsers

By By Omar RABIOmar RABI

Page 2: XML DOM and SAX Parsers

Introduction to parsersIntroduction to parsers

The word The word parser parser comes from comes from compilers compilers

In a compiler, a parser is the module In a compiler, a parser is the module that reads and interprets the that reads and interprets the programming language. programming language.

Page 3: XML DOM and SAX Parsers

Introduction to ParsersIntroduction to Parsers

In XML, a In XML, a parser is a parser is a software software component component that sits that sits between the between the application application and the XML and the XML files.files.

Page 4: XML DOM and SAX Parsers

Introduction to parsersIntroduction to parsers

It reads a text-formatted XML file or It reads a text-formatted XML file or stream and converts it to a stream and converts it to a document to be manipulated by the document to be manipulated by the application.application.

Page 5: XML DOM and SAX Parsers

Well-formedness and validityWell-formedness and validity

Well-formed documents respect the Well-formed documents respect the syntactic rules. syntactic rules.

Valid documents not only respect the Valid documents not only respect the syntactic rules but also conform to a syntactic rules but also conform to a structure as described in a DTD. structure as described in a DTD.

Page 6: XML DOM and SAX Parsers

Validating vs. Non-validating Validating vs. Non-validating parsersparsers

Both parsers enforce syntactic rules Both parsers enforce syntactic rules

only validating parsers know how to only validating parsers know how to validate documents against their validate documents against their DTDs DTDs

Page 7: XML DOM and SAX Parsers

Tree-based parsers Tree-based parsers

These map an XML document into an These map an XML document into an internal tree structure, and then internal tree structure, and then allow an application to navigate that allow an application to navigate that tree.tree.

Ideal for browsers, editors, XSL Ideal for browsers, editors, XSL processors.processors.

Page 8: XML DOM and SAX Parsers

Event-based Event-based

An An event-based API reports parsing reports parsing events (such as the start and end of events (such as the start and end of elements) directly to the application elements) directly to the application through callbacks.through callbacks.

The application implements handlers The application implements handlers to deal with the different events to deal with the different events

Page 9: XML DOM and SAX Parsers

Event-based vs. Tree-based Event-based vs. Tree-based parsersparsers

Tree-based parsers deal generally Tree-based parsers deal generally small documents.small documents.

Event-based parsers deal generally Event-based parsers deal generally used for large documents.used for large documents.

Page 10: XML DOM and SAX Parsers

Event-based vs. Tree-based Event-based vs. Tree-based parsersparsers

Tree-based parsers are generally Tree-based parsers are generally easier to implement.easier to implement.

Event-based parsers are more Event-based parsers are more complex and give hard time for the complex and give hard time for the programmerprogrammer

Page 11: XML DOM and SAX Parsers

What is DOM?What is DOM?

The Document Object Model (DOM) is The Document Object Model (DOM) is an application programming an application programming interface (API) for HTML and XML interface (API) for HTML and XML documents. documents.

It defines the logical structure of It defines the logical structure of documents and the way a document documents and the way a document is accessed and manipulatedis accessed and manipulated

Page 12: XML DOM and SAX Parsers

Properties of DOMProperties of DOM Programmers can build documents, Programmers can build documents,

navigate their structure, and add, modify, navigate their structure, and add, modify, or delete elements and content. or delete elements and content.

Provides a standard programming Provides a standard programming interface that can be used in a wide interface that can be used in a wide variety of environments and applications.variety of environments and applications.

structural isomorphism.structural isomorphism.

Page 13: XML DOM and SAX Parsers

DOM IdentifiesDOM Identifies

The interfaces and objects used to The interfaces and objects used to represent and manipulate a document.represent and manipulate a document.

The semantics of these interfaces and The semantics of these interfaces and objects - including both behavior and objects - including both behavior and attributes.attributes.

The relationships and collaborations The relationships and collaborations among these interfaces and objects.among these interfaces and objects.

Page 14: XML DOM and SAX Parsers

What DOM is not!!What DOM is not!!

The Document Object Model is not a The Document Object Model is not a binary specificationbinary specification..

The Document Object Model is not a way The Document Object Model is not a way of persisting objects to XML or HTML. of persisting objects to XML or HTML.

The Document Object Model does not The Document Object Model does not define "the true inner semantics" of XML define "the true inner semantics" of XML or HTML.or HTML.

Page 15: XML DOM and SAX Parsers

What DOM is not!!What DOM is not!!

The Document Object Model is not a The Document Object Model is not a set of data structures, it is an object set of data structures, it is an object model that specifies interfaces. model that specifies interfaces.

The Document Object Model is not a The Document Object Model is not a competitor to the Component Object competitor to the Component Object Model (COM).Model (COM).

Page 16: XML DOM and SAX Parsers

DOM into workDOM into work

<?xml version="1.0"?><?xml version="1.0"?><products><products>

<product><product><name>XML Editor</name><name>XML Editor</name><price>499.00</price><price>499.00</price>

</product></product><product><product>

<name>DTD Editor</name><name>DTD Editor</name><price>199.00</price><price>199.00</price>

</product></product><product><product>

<name>XML Book</name><name>XML Book</name><price>19.99</price><price>19.99</price>

</product></product><product><product>

<name>XML Training</name><name>XML Training</name><price>699.00</price><price>699.00</price>

</product></product></products></products>

Page 17: XML DOM and SAX Parsers

DOM into workDOM into work

Page 18: XML DOM and SAX Parsers

DOM levels: level 0DOM levels: level 0

DOM Level 0 is a mix of Netscape DOM Level 0 is a mix of Netscape Navigator 3.0 and MS Internet Navigator 3.0 and MS Internet Explorer 3.0 document Explorer 3.0 document functionalities. functionalities.

Page 19: XML DOM and SAX Parsers

DOM levels: DOM 1DOM levels: DOM 1

It contains functionality for document It contains functionality for document navigation and manipulation.navigation and manipulation.

i.e.: functions for creating, deleting i.e.: functions for creating, deleting and changing elements and their and changing elements and their attributes. attributes.

Page 20: XML DOM and SAX Parsers

DOM level 1 limitationsDOM level 1 limitations A structure model for the internal A structure model for the internal

subset and the external subset. subset and the external subset. Validation against a schema. Validation against a schema. Control for rendering documents via Control for rendering documents via

style sheets. style sheets. Access control. Access control. Thread-safety. Thread-safety. EventsEvents

Page 21: XML DOM and SAX Parsers

DOM levels: DOM 2DOM levels: DOM 2 A style sheet object model and A style sheet object model and

defines functionality for manipulating defines functionality for manipulating the style information attached to a the style information attached to a document.document.

Enables of the traversal on the Enables of the traversal on the document.document.

Defines an event model.Defines an event model. Provides support for XML Provides support for XML

namespaces namespaces

Page 22: XML DOM and SAX Parsers

DOM levels: DOM 3DOM levels: DOM 3 Document loading and saving as well Document loading and saving as well

as content models (such as DTD’s as content models (such as DTD’s and schemas) with document and schemas) with document validation support. validation support.

Document views and formatting, key Document views and formatting, key events and event groups events and event groups

Page 23: XML DOM and SAX Parsers

An Application of DOMAn Application of DOM

<HTML><HTML><HEAD><HEAD><TITLE>Currency Conversion</TITLE><TITLE>Currency Conversion</TITLE><SCRIPT LANGUAGE="JavaScript" SRC="conversion.js"></SCRIPT><SCRIPT LANGUAGE="JavaScript" SRC="conversion.js"></SCRIPT></HEAD></HEAD><BODY><BODY><CENTER><CENTER><FORM ID="controls"><FORM ID="controls">File: <INPUT TYPE="TEXT" NAME="fname" VALUE="prices.xml">File: <INPUT TYPE="TEXT" NAME="fname" VALUE="prices.xml">Rate: <INPUT TYPE="TEXT" NAME="rate" VALUE="0.95274" SIZE="4"><BR>Rate: <INPUT TYPE="TEXT" NAME="rate" VALUE="0.95274" SIZE="4"><BR><INPUT TYPE="BUTTON" VALUE="Convert" ONCLICK="convert(controls,xml)"><INPUT TYPE="BUTTON" VALUE="Convert" ONCLICK="convert(controls,xml)"><INPUT TYPE="BUTTON" VALUE="Clear" ONCLICK="output.value=''"><BR><INPUT TYPE="BUTTON" VALUE="Clear" ONCLICK="output.value=''"><BR><TEXTAREA NAME="output" ROWS="10" COLS="50" READONLY> </TEXTAREA><TEXTAREA NAME="output" ROWS="10" COLS="50" READONLY> </TEXTAREA></FORM></FORM><xml id="xml"></xml><xml id="xml"></xml></CENTER></CENTER></BODY></BODY></HTML></HTML>

Page 24: XML DOM and SAX Parsers

An Application of DOMAn Application of DOM <xml id="xml"></xml>:<xml id="xml"></xml>: defines an XML

island.

XML islands are mechanisms used to insert XML in HTML documents.

In this case, XML islands are used to access Internet Explorer’s XML parser. The price list is loaded into the island.

Page 25: XML DOM and SAX Parsers

An Application of DOMAn Application of DOM The “Convert” button in the HTML file

calls the JavaScript function convert(), which is the conversion routine.

convert() accepts two parameters, the form and the XML island.

Page 26: XML DOM and SAX Parsers

An Application for DOMAn Application for DOM<SCRIPT LANGUAGE="JavaScript" SRC="conversion.js"></SCRIPT><SCRIPT LANGUAGE="JavaScript" SRC="conversion.js"></SCRIPT>

function convert(form,xmldocument)function convert(form,xmldocument){var fname = form.fname.value,{var fname = form.fname.value,output = form.output,output = form.output,rate = form.rate.value;rate = form.rate.value;output.value = "";output.value = "";var document = parse(fname,xmldocument),var document = parse(fname,xmldocument),topLevel = document.documentElement;topLevel = document.documentElement;searchPrice(topLevel,output,rate);searchPrice(topLevel,output,rate);}}

function parse(uri,xmldocument)function parse(uri,xmldocument){xmldocument.async = false;{xmldocument.async = false;xmldocument.load(uri);xmldocument.load(uri);if(xmldocument.parseError.errorCode != if(xmldocument.parseError.errorCode != 0)0)alert(xmldocument.parseError.reason);alert(xmldocument.parseError.reason);return xmldocument;}return xmldocument;}

function searchPrice(node,output,rate)function searchPrice(node,output,rate){if(node.nodeType == 1){if(node.nodeType == 1){if(node.nodeName == "price"){if(node.nodeName == "price")output.value += (getText(node) * rate) + "\r";output.value += (getText(node) * rate) + "\r";var children,var children,i;i;children = node.childNodes;children = node.childNodes;for(i = 0;i < children.length;i++)for(i = 0;i < children.length;i++)searchPrice(children.item(i),output,rate);}}searchPrice(children.item(i),output,rate);}}

function getText(node)function getText(node){return node.firstChild.data;}{return node.firstChild.data;}

Page 27: XML DOM and SAX Parsers

An Application of DOMAn Application of DOM nodeType is a code representing the type of the object.

parentNode is the parent (if any) of current Node object. childNode is the list of children for the current Node object.

firstChild is the Node’s first child. lastChild is the Node’s last child.

previousSibling is the Node immediately preceding the current one.

nextSibling is the Node immediately following the current one.

attributes is the list of attributes, if the current Node has any.

Page 28: XML DOM and SAX Parsers

An Application of DOMAn Application of DOM

The parse() function loads the price list in the XML island and returns its Document object.

The function searchPrice() tests whether the current node is an element.

Page 29: XML DOM and SAX Parsers

An Application of DOMAn Application of DOM

The function searchPrice() visits each node by recursively calling itself for all children of the current node.

Page 30: XML DOM and SAX Parsers

An Application for DOMAn Application for DOM

Page 31: XML DOM and SAX Parsers

What is SAX?What is SAX? SAX (the Simple API for XML) is an event-SAX (the Simple API for XML) is an event-

based parser for xml documents. based parser for xml documents.

The parser tells the application what is in The parser tells the application what is in the document by notifying the application the document by notifying the application of a stream of parsing events. of a stream of parsing events.

Application then processes those events to Application then processes those events to act on data. act on data.

Page 32: XML DOM and SAX Parsers

SAX HistorySAX History

SAX 1.0 was released on May 11, 1998. SAX 1.0 was released on May 11, 1998.

SAX is a common, event-based API for SAX is a common, event-based API for parsing XML documents, developed as a parsing XML documents, developed as a collaborative project of the members of collaborative project of the members of the XML-DEV discussion under the the XML-DEV discussion under the leadership of David Megginson. leadership of David Megginson.

Page 33: XML DOM and SAX Parsers

Why SAX?Why SAX?

For applications that are not so XML-For applications that are not so XML-centric, an object-based interface is centric, an object-based interface is less appealing. less appealing.

Efficiency: lower level than object-Efficiency: lower level than object-based interfaces based interfaces

Page 34: XML DOM and SAX Parsers

Why SAX?Why SAX?

Event-based interface consumes Event-based interface consumes fewer resources than an object-fewer resources than an object-based one based one

With an event-based interface, the With an event-based interface, the application can start processing the application can start processing the document as the parser is reading it document as the parser is reading it

Page 35: XML DOM and SAX Parsers

Limitations of SAXLimitations of SAX

With SAX, it is not possible to With SAX, it is not possible to navigate through the document as navigate through the document as you can with a DOM.you can with a DOM.

The application must explicitly buffer The application must explicitly buffer those events it is interested in. those events it is interested in.

Page 36: XML DOM and SAX Parsers

SAX APISAX API

Parser events are similar to user-Parser events are similar to user-interface events such as ONCLICK (in interface events such as ONCLICK (in a browser) or AWT events (in Java).a browser) or AWT events (in Java).

Events alert the application that Events alert the application that something happened and the something happened and the application might want to react. application might want to react.

Page 37: XML DOM and SAX Parsers

SAX APISAX API Element opening tagsElement opening tags

Element closing tagsElement closing tags

Content of elementsContent of elements

EntitiesEntities

Parsing errorsParsing errors

Page 38: XML DOM and SAX Parsers

SAX APISAX API

Page 39: XML DOM and SAX Parsers

SAX ExampleSAX Example

<?xml version="1.0"?><?xml version="1.0"?><doc><doc>

<para>Hello, world!</para><para>Hello, world!</para></doc> </doc>

Page 40: XML DOM and SAX Parsers

SAX exampleSAX example

start documentstart document start element: docstart element: doc start element: parastart element: para characters: Hello, world!characters: Hello, world! end element: paraend element: para end element: docend element: doc end document end document

Page 41: XML DOM and SAX Parsers

ConclusionConclusion