sax parsing presented by clifford lemoine csc 436 compiler design

21
SAX Parsing SAX Parsing Presented by Presented by Clifford Lemoine Clifford Lemoine CSC 436 CSC 436 Compiler Design Compiler Design

Upload: alfred-clarke

Post on 29-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

SAX ParsingSAX Parsing

Presented byPresented byClifford LemoineClifford Lemoine

CSC 436CSC 436Compiler DesignCompiler Design

SAX Parsing SAX Parsing IntroductionIntroduction

Review of XMLReview of XML What is SAX parsing?What is SAX parsing? Simple Example programSimple Example program Compiler Design IssuesCompiler Design Issues

Demonstrated by a more complex Demonstrated by a more complex exampleexample

Wrap-upWrap-up ReferencesReferences

Quick XML ReviewQuick XML Review

XML – Wave of the futureXML – Wave of the future Method of representing dataMethod of representing data Differs from HTML by storing and Differs from HTML by storing and

representing data instead of displaying or representing data instead of displaying or formatting dataformatting data

Tags similar to HTML tags, only they are Tags similar to HTML tags, only they are user-defineduser-defined

Follows a small set of basic rulesFollows a small set of basic rules Stored as a simple ASCII text file, so Stored as a simple ASCII text file, so

portability is insanely easyportability is insanely easy

Quick XML ReviewQuick XML Review

SyntaxSyntax Every XML document has a preambleEvery XML document has a preamble

<?xml version=“1.0” ?><?xml version=“1.0” ?> An XML document may or may not have An XML document may or may not have

a DTD (Document Type Definition) or a DTD (Document Type Definition) or SchemaSchema <!DOCTYPE catalog><!DOCTYPE catalog>

Quick XML ReviewQuick XML Review

Syntax cont.Syntax cont. Every element has a start and end tag, Every element has a start and end tag,

with optional attributeswith optional attributes <catalog <catalog version=“1.0”version=“1.0”> … </catalog>> … </catalog>

If an element does not contain any data If an element does not contain any data (or elements) nested within, the closing (or elements) nested within, the closing tag can be merged with the start tag tag can be merged with the start tag like so:like so: <catalog <catalog version=“1.0”version=“1.0”/>/>

Quick XML ReviewQuick XML Review

Syntax cont.Syntax cont. Elements must be Elements must be properly nestedproperly nested The outermost element is called the The outermost element is called the root root

elementelement An XML document that follows the basic An XML document that follows the basic

syntax rules is called syntax rules is called well-formedwell-formed An XML document that is well-formed and An XML document that is well-formed and

conforms to a DTD or Schema is called conforms to a DTD or Schema is called validvalid Once again, XML documents do not always Once again, XML documents do not always

require a DTD or Schema, but they require a DTD or Schema, but they mustmust be be well-formedwell-formed

Quick XML ReviewQuick XML Review

Sample XML filesSample XML files Catalog.xmlCatalog.xml authorSimple.xmlauthorSimple.xml authorSimpleError.xmlauthorSimpleError.xml

What is SAX Parsing?What is SAX Parsing?

SSimple imple AAPI for PI for XXML = ML = SAXSAX SAX is an SAX is an event-basedevent-based parsing method parsing method

We are all familiar with event-driven software, We are all familiar with event-driven software, whether we know it or notwhether we know it or not

Pop-up windows, pull-down menus, etc.Pop-up windows, pull-down menus, etc. If a certain “event” (or action) happens, do If a certain “event” (or action) happens, do

somethingsomething A SAX parser reads an XML document, A SAX parser reads an XML document,

firing (or calling) callback methods when firing (or calling) callback methods when certain events are found (e.g. elements, certain events are found (e.g. elements, attributes, start/end tags, etc.)attributes, start/end tags, etc.)

What is SAX Parsing?What is SAX Parsing?

Benefits of SAX parsingBenefits of SAX parsing Unlike DOM (Document Object Model), SAX does Unlike DOM (Document Object Model), SAX does

not store information in an internal tree structurenot store information in an internal tree structure Because of this, SAX is able to parse huge Because of this, SAX is able to parse huge

documents (think gigabytes) without having to documents (think gigabytes) without having to allocate large amounts of system resourcesallocate large amounts of system resources

Really great if the amount of data you’re looking to Really great if the amount of data you’re looking to store is relatively small (no waste of memory on store is relatively small (no waste of memory on tree)tree)

If processing is built as a pipeline, you don’t have to If processing is built as a pipeline, you don’t have to wait for the data to be converted to an object; you wait for the data to be converted to an object; you can go to the next process once it clears the can go to the next process once it clears the preceding callback methodpreceding callback method

What is SAX Parsing?What is SAX Parsing?

DownsideDownside Most limitations are the programmer’s Most limitations are the programmer’s

problem, not the API’sproblem, not the API’s SAX does not allow random access to SAX does not allow random access to

the file; it proceeds in a single pass, the file; it proceeds in a single pass, firing events as it goesfiring events as it goes

Makes it hard to implement cross-Makes it hard to implement cross-referencing in XML (ID and IDREF) as referencing in XML (ID and IDREF) as well as complex searching routineswell as complex searching routines

What is SAX Parsing?What is SAX Parsing?

Callback MethodsCallback Methods The SAX API has a default handler class built The SAX API has a default handler class built

in so you don’t have to re-implement the in so you don’t have to re-implement the interfaces every time interfaces every time ((org.xml.sax.helpers.DefaultHandlerorg.xml.sax.helpers.DefaultHandler))

The five most common methods to override The five most common methods to override are:are:

startElement(String uri, String lname, String qname, startElement(String uri, String lname, String qname, Attributes atts)Attributes atts)

endDocument(String uri, String lname, String qname)endDocument(String uri, String lname, String qname) characters(char text[], int start, int length)characters(char text[], int start, int length) startDocument()startDocument() endDocument()endDocument()

Simple Example ProgramSimple Example Program

Sax.javaSax.java Instantiates a SAX parser and creates a Instantiates a SAX parser and creates a

default handler for the parserdefault handler for the parser Reads in an XML document and echoes Reads in an XML document and echoes

the structure to the standard outthe structure to the standard out Two sample XML documents:Two sample XML documents:

authorSimple.xmlauthorSimple.xml authorSimpleError.xmlauthorSimpleError.xml

Demonstration hereDemonstration here

Compiler Design IssuesCompiler Design Issues

What is actually happening when a SAX What is actually happening when a SAX parser parses an XML document?parser parses an XML document?

What type of internal data structures What type of internal data structures does it use?does it use?

How do the callback methods fit in?How do the callback methods fit in? Can it solve problems of world peace, Can it solve problems of world peace,

hunger, and death? (Or at least can it hunger, and death? (Or at least can it help me pass Compiler Design?)help me pass Compiler Design?)

Demonstrated with Demonstrated with SaxCatalogUnmarshaller exampleSaxCatalogUnmarshaller example

Compiler Design IssuesCompiler Design Issues

Heart of the BeastHeart of the Beast Underneath it all, the SAX parser uses a Underneath it all, the SAX parser uses a stackstack Whenever an element is started, a new data Whenever an element is started, a new data

object is pushed onto the stackobject is pushed onto the stack Later, when the element is closed, the topmost Later, when the element is closed, the topmost

object on the stack is finished and can be object on the stack is finished and can be poppedpopped

Unless it is the root element, the popped Unless it is the root element, the popped element will have been a child element of the element will have been a child element of the object that now occupies the top of the stack object that now occupies the top of the stack ((boardboard))

Compiler Design IssuesCompiler Design Issues

Heart of the Beast cont.Heart of the Beast cont. This process corresponds to the This process corresponds to the shift-shift-

reduce cycle of bottom-up parsersreduce cycle of bottom-up parsers It is crucial that XML elements be well-It is crucial that XML elements be well-

formed and properly nested for this to formed and properly nested for this to workwork

Compiler Design IssuesCompiler Design Issues

startElement()startElement() Four parameters:Four parameters:

String uriString uri = the namespace URI = the namespace URI (Uniform Resource (Uniform Resource Identifier)Identifier)

String lnameString lname = the local name of the element = the local name of the element String qnameString qname = the qualified name of the element = the qualified name of the element Attributes attsAttributes atts = list of attributes for this element = list of attributes for this element

If the current element is a complex element, If the current element is a complex element, an object of the appropriate type is created an object of the appropriate type is created and pushed on to the stackand pushed on to the stack

If the element is simple, a If the element is simple, a StringBufferStringBuffer is pushed is pushed on to the stack, ready to accept character dataon to the stack, ready to accept character data

Compiler Design IssuesCompiler Design Issues

endElement()endElement() Three parameters:Three parameters:

String uriString uri = the namespace URI = the namespace URI (Uniform Resource (Uniform Resource Identifier)Identifier)

String lnameString lname = the local name of the element = the local name of the element String qnameString qname = the qualified name of the element = the qualified name of the element

The topmost element on the stack is popped, The topmost element on the stack is popped, converted to the proper type, and inserted converted to the proper type, and inserted into its parent, which now occupies the top into its parent, which now occupies the top of the stack of the stack (unless this is the root element – (unless this is the root element – special handling required)special handling required)

Compiler Design IssuesCompiler Design Issues

characters()characters() Three parameters:Three parameters:

char text[]char text[] = character array containing the = character array containing the entireentire XML documentXML document

int startint start = starting index of current data in = starting index of current data in text[]text[] int lengthint length = ending index of current data in = ending index of current data in text[]text[]

When the parser encounters raw text, it When the parser encounters raw text, it passes a char array containing the actual passes a char array containing the actual data, the starting position, and the length of data, the starting position, and the length of data to be read from the arraydata to be read from the array

Compiler Design IssuesCompiler Design Issues

characters()characters() cont. cont. The implementation of the callback The implementation of the callback

method inserts the data into the method inserts the data into the StringBufferStringBuffer located on the top of the stack located on the top of the stack

Can lead to confusion because of:Can lead to confusion because of: No guarantee that a single stretch of No guarantee that a single stretch of

characters results in one call to characters results in one call to characters()characters() It stores all characters, including It stores all characters, including

whitespace, encountered by the parserwhitespace, encountered by the parser

Wrap-upWrap-up

SAX is an event-based parser, using SAX is an event-based parser, using callback methods to handle events callback methods to handle events found by the parserfound by the parser

Applications are written by extending Applications are written by extending the the DefaultHandlerDefaultHandler class and overriding the class and overriding the event handler methodsevent handler methods

The SAX parser usually uses a stack to The SAX parser usually uses a stack to perform operationsperform operations

And No, SAX will not save the world…And No, SAX will not save the world…

ReferencesReferences

Gittleman, Art. Gittleman, Art. Advanced Java: Internet Applications Advanced Java: Internet Applications (Second Edition).(Second Edition). Scott Jones Publishers. El Granada, Scott Jones Publishers. El Granada, California. 2002. pp. 504-511.California. 2002. pp. 504-511.

Janert, Phillip K. “Simple XML Parsing with SAX and Janert, Phillip K. “Simple XML Parsing with SAX and DOM.” DOM.” http://www.onjava.com/pub/a/onjava/2002/06/26/xml.http://www.onjava.com/pub/a/onjava/2002/06/26/xml.htmlhtmlPublished June 26, 2002. Accessed February 10, 2003.Published June 26, 2002. Accessed February 10, 2003.

Wati, Anjini. “E-Catalog for a Small to Medium Wati, Anjini. “E-Catalog for a Small to Medium Enterprise.” Enterprise.” http://ispg.csu.edu.au/subjects/itc594/reports/Tr-005.http://ispg.csu.edu.au/subjects/itc594/reports/Tr-005.docdocAccessed February 10, 2003.Accessed February 10, 2003.