sax. what is sax sax 1.0 was released on may 11, 1998. sax is a common, event-based api for parsing...

19
SAX

Upload: elaine-wood

Post on 30-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

SAX

Page 2: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

What is SAX

SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML doc

uments Primarily a Java API but there implementations in most

languages The current version is SAX 2.0.1, and there are versions

for several programming language environments other than Java

Page 3: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

How does SAX work

An XML document is seen as a series of “events” Unlike DOM, SAX does not store information in an internal tre

e structure SAX is able to parse huge documents (think gigabytes) withou

t having to allocate large amounts of system resources If processing is built as a pipeline, it doesn’t have to wait for th

e data to be converted to an object; it can go to the next process once it clears the preceding callback method

SAX does not allow random access to the file; it proceeds in a single pass, firing events as it goes

Page 4: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

SAX Structure(1/4)

Page 5: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

SAX Structure(2/4)

SAXParserFactory:A SAXParserFactory object creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory.

SAXParser:The SAXParser interface defines several kinds of parse() methods. In general, it passes an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object.

SAXReader:The SAXParser wraps a SAXReader. Typically, it doesn't care about that, but every once in a while it needs to get hold of it using SAXParser's getXMLReader() so that it can configure it. It is the SAXReader that carries on the conversation with the SAX event handlers it defines.

Page 6: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

SAX Structure(3/4)

DefaultHandler:Not shown in the diagram, a DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods), so it can override only the ones it is interested in.

ContentHandler:Methods such as startDocument, endDocument, startElement, and endElement are invoked when an XML tag is recognized. This interface also defines the methods characters and processingInstruction, which are invoked when the parser encounters the text in an XML element or an inline processing instruction, respectively.

EntityResolver:The resolve Entity method is invoked when the parser must identify data identified by a URI

Page 7: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

SAX Structure(4/4)

ErrorHandler:Methods error, fatalError, and warning are invoked in response to various parsing errors. The default error handler throws an exception for fatal errors and ignores other errors (including validation errors). That's one reason you need to know something about the SAX parser, even if you are using the DOM. Sometimes, the application may be able to recover from a validation error. Other times, it may need to generate an exception. To ensure the correct handling, you'll need to supply your own error handler to the parser.

DTDHandler:Defines methods you will generally never be called upon to use. Used when processing a DTD to recognize and act on declarations for an unparsed entity.

Page 8: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

SAX Event

startDocument endDocument startElement endElement characters

Page 9: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

Pull Parsing Versus Push Parsing Streaming pull parsing refers to a programming model

in which a client application calls methods on an XML parsing library when it needs to interact with an XML infoset--that is, the client only gets (pulls) XML data when it ex

plicitly asks for it. Streaming push parsing refers to a programming mode

l in which an XML parser sends (pushes) XML data to the client as the parser encounters elements in an XML infoset--that is, the parser sends the data whether or not the client is ready to use it at that time.

Page 10: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

XML Parser API Feature Summary Feature StAX SAX DOM API Type Pull,streaming Push,streaming In memory tree Ease of Use High Medium High

XPathCapability No No Yes

CPU and MemoryEfficiency Good Good Varies

Forward Only Yes Yes No

Read XML Yes Yes Yes

Write XML Yes No Yes

Create, Read, Update, Delete No No Yes

Page 11: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

XML Parser and APIs supporting SAX Xerces

Xerces is a family of software packages for parsing and manipulating XML, part of the Apache XML project

MSXML Microsoft XML Core Services (MSXML) is a set of services that allow a

pplications written in JScript, VBScript and Microsoft Visual Studio 6.0 to build XML-based applications

Crimson XML JAXP: Java API for XML Processing

The Java API for XML Processing, or JAXP, is one of the Java XML programming APIs. It provides the capability of validating and parsing XML documents

Page 12: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

SAX Example

Page 13: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

public class MySAXApp extends DefaultHandler{

XMLReader xr = XMLReaderFactory.createXMLReader();MySAXApp handler = new MySAXApp();xr.setContentHandler(handler);xr.setErrorHandler(handler);FileReader r = new FileReader(file);xr.parse(new InputSource(r));

//////////////////////////////////////////////////////////////////// // Event handlers. ////////////////////////////////////////////////////////////////////}

Page 14: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

public void startDocument (){

// TODO: add customized code here}public void endDocument (){

// TODO: add customized code here}public void startElement (String uri, String name, String qName, Attrib

utes atts) {

// TODO: add customized code here }public void endElement (String uri, String name, String qName){

// TODO: add customized code here}

Page 15: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

Applications of XML Stream Processing

content-based XML routing selective dissemination of information continuous queries processing of scientific data stored in large X

ML files

Page 16: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

Selective Dissemination of Information

The use of selective approaches to dissemination in order to avoid users with unnecessary information.

Applications: stock and sports tickers traffic information systems electronic personalized newspapers entertainment delivery

Page 17: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

Typical SDI Systems

Representation of user profiles simple keyword matching “bag of words” Information Retrieval (IR) techniqu

es Limited ability Inefficiency of filtering

Page 18: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

Selective Dissemination of Information

Page 19: SAX. What is SAX SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations

References

M. Altinel, M. J. Franklin. Efficient Filtering of XML Documents for Selective Dissemination of Information. In VLDB Conf., Sep. 2000.

Y. Diao, P. Fischer, M. Franklin, and R. To. Yfilter: Efficient and scalable Filtering of XML documents. In Proceedings of the International Conference on Data Engineering, San Jose, California, February 2002.