apache dom parser©zwzoctober 24, 2002 wenzhong zhao department of computer science the university...
TRANSCRIPT
Apache DOM Parser©zwzOctober 24, 2002
Wenzhong Zhao
Department of Computer Science
The University of Kentucky
October 24, 2002 ©zwz Apache DOM Parser
Overview
org.w3c.dom– Node
• Document
• Element
• Attr
• Text
– NodeList
– NamedNodeMap
org.apache.xerces.parsers– DOMParser
Sample Code Segment Xerces Parser Info
October 24, 2002 ©zwz Apache DOM Parser
Overview
<root>
<?xml version=“1.0”><root> <a id =“1”>This</a> is <b/> <c> a test</c> </root>
<a> <b> <c>“is”
“This” “a test”
id=“1”
October 24, 2002 ©zwz Apache DOM Parser
Overview
<root>
<a> <b> <c>“is”
“This” “a test”
id=“1”
Document
October 24, 2002 ©zwz Apache DOM Parser
Overview
<root>
<a> <b> <c>“is”
“This” “a test”
id=“1”
Document
Element
October 24, 2002 ©zwz Apache DOM Parser
Overview
<root>
<a> <b> <c>“is”
“This” “a test”
id=“1”
Document
Element
Attr
Text
October 24, 2002 ©zwz Apache DOM Parser
DOM Interface
Represents XML documents as objects
– Extract information
– Manipulate XML documents
A set of Interfaces
– Defined by w3c
– Platform and language-neutral
– Must be implemented by classes which contain actual data
– The DOM parser: responsible for providing those implementation classes
October 24, 2002 ©zwz Apache DOM Parser
DOM Interface Hierarchy
Node– Attr– CharacterData
• Comment• Text
– CDATASection
– Document– DocumentFragment– DocumentType– Element– Entity– EntityReference– Notation– ProcessingInstruction
DOMImplementation NamedNodeMap NodeList
October 24, 2002 ©zwz Apache DOM Parser
Interface Node
Represents a single node The primary datatype for the entire DOM Methods defined here are available for all sub-interfaces Methods
– Public NodeList getChildNodes()
– Public Node getFirstChild()
– Public Node getNextSibling()
– Public Node getParentNode()
– Public NamedNodeMap getAttributes()
October 24, 2002 ©zwz Apache DOM Parser
Interface Node (cont’)
– Public String getNodeName()
– Public String getNodeValue()
– Public short getNodeType()
– Public Node appendChild(Node newChild)
• throws DOMException
– Public Node removeChild(Node oldChild)
• throws DOMException
October 24, 2002 ©zwz Apache DOM Parser
Interface Document
Represents an entire XML document Methods
– Public Element getDocumentElement()• Returns the root element of the document
– Public NodeList getElementsByTagName(String tag)• Returns a list of nodes with the specified tag name
• In the order of a preorder traversal of the document
October 24, 2002 ©zwz Apache DOM Parser
Interface Element
Represents an element in a document Methods
– Public NodeList getElementsByTagName(String tag)• Returns a list of nodes with the specified tag name
• In the order of a preorder traversal of the document
– Public String getTagName()
– Public Attr getAttributeNode()
October 24, 2002 ©zwz Apache DOM Parser
Interface Attr
Represents an attribute in an Element object Methods
– Public String getName()
– Public String getValue()
– Public void setValue(String value)
• throws DOMException
October 24, 2002 ©zwz Apache DOM Parser
Interface Text
Represents the textual content of an Element or Attr Methods
– Public String getData()
• throws DOMException
– Public void setData(String data)
• throws DOMException
– Public int getLength()
• Returns the size of the Text node
October 24, 2002 ©zwz Apache DOM Parser
Interface NodeList
Represents a collection of nodes– an ordered collection
Methods– Public Node item(int index)
• Returns the node with the specified index
• Note: index starts with 0
– Public int getLength()
• Returns the size of the node list
October 24, 2002 ©zwz Apache DOM Parser
Interface NamedNodeMap
Represents a collection of nodes – Can be accessed by name– Not maintained in any particular order
Methods– Public Node getNamedItem(String name)
• Returns the node with the specified name– Public Node item(int index)
• Returns the node with the specified ordinal index• Note: index starts with 0
– Public int getLength()• Returns the size of the node collection
– Public void setNamedItem(Node newNode)• throws DOMException
October 24, 2002 ©zwz Apache DOM Parser
Class DOMParser
A software library (or software package) – Provides clear APIs for client applications to manipulate XML
documents.
Is tree-based– Produces a w3c DOM tree in memory which is called a
Document object.
Client applications access or modify the information stored in the original XML document by– Invoking methods on the Document object
– Invoking methods on other objects it contains.
Vendors:– Apache, Oracle, IBM, Microsoft, Sun, Tibco
October 24, 2002 ©zwz Apache DOM Parser
DOMParser (cont’)
Constructor: – Public DOMParser()
• Use DTD/Schema parser configuration
Methods– Public void parse(String systemId)
• throws SAXException and IOException
• Build a DOM tree if successful
– Public void parse(InputSource is)
• throws SAXException and IOException
• Build a DOM tree if successful
– Public Document getDocument()
• Return a Document object
October 24, 2002 ©zwz Apache DOM Parser
DOMParser (cont’)
– void setFeature(String featureId, boolean state)• Throws SAXNotRecognizedException and
SAXNotSupportedException
• Set the state of the feature in the SAX2 parser
• http://apache.org/xml/features/validation/schema
– void setProperty(String propertyId, Object value)• Throws SAXNotRecognizedException and
SAXNotSupportedException
• Set the value of the property in the SAX2 parser
• http://apache.org/xml/properties/schema/external-schemaLocation
• http://dblab.csr.uky.edu/~wzhao0/schema/spo/spo.xsd
October 24, 2002 ©zwz Apache DOM Parser
DOMParser (cont’)
– Protected void setValidation(boolean validation)• Throws SAXNotRecognizedException and
SAXNotSupportedException
• Set whether the parser validates (by default, against DTD)
• Equivalent to setFeature()– Feature ID: http://xml.org/sax/features/validation
– Protected void setValidationSchema(boolean schema)• Throws SAXNotRecognizedException and
SAXNotSupportedException
• Set schema support on/off
• Equivalent to setFeature()– Feature ID: http://apache.org/xml/features/validation/schema
October 24, 2002 ©zwz Apache DOM Parser
Sample Code Segment
DOMParser parser = new DOMParser(); // Create a Xerces DOM Parserparser.setFeature(“http://xml.org/sax/features/validation”, true); // Set the feature for
validation against DTD
/* Prepare the SystemId for the XML document here … */parser.parse(SystemId); //Parse the input XML documentDocument doc = parser.getDocument(); //Obtain the Document objectElement root = doc.getDocumentElement(); //Obtain the root elementNodeList nl = doc.getElementsByTagName(“person"); //Get the Node List by nameint len = nl.getLength(); //Get the length of the NodeListfor (int i = 0; i < len; i++) {
Node node = (Node) nl.item(i); //Get the Node by indexif (node.hasChildNodes()) {
//Do something for this Node’s childrenNode firstChild = (Node) node.getFirstChild();…
}}
October 24, 2002 ©zwz Apache DOM Parser
Xerces Parser Info Apache Xerces Parser API:
http://xml.apache.org/xerces-j/apiDocs/overview-summary.html Feature IDs: http://xml.apache.org/xerces2-j/features.html Property IDs: http://xml.apache.org/xerces2-j/properties.html Location for Xerces Parser in cslab machine: /usr/local/xml-xerces2 Be sure to include the following packages in your java source:
– org.apache.xerces.parsers.DOMParser
– org.xml.sax.helpers.*
– org.xml.sax.*
– org.w3c.dom.*
– Others maybe