Chapter 24 XML

Chapter 24



• Understanding XML elements and attributes

• Understanding the concept of an XML parser

• Being able to read and write XML documents

• Being able to design Document Type Definitions for XML documents


• Stands for Extensible Markup Language

• Lets you encode complex data in a form that the recipient can parse easily

• Is independent from any programming language

XML Encoding of Coin Data

<coin> <value>0.5</value> <name>half dollar</name>


Advantages of XML

• XML files are readable by both computers and humans

• XML formatted data is resilient to change

o It is easy to add new data elements

o Old programs can process the old information in the new data format

Differences Between XML and HTML

• Both are descendants of SGML (Standard Generalized Markup Language)

• XML is a simplified version of SGML

• XML is very strict but HTML (as used today) is not

• XML tells what the data means; HTML tells how to display data

• XML tags are case-sensitive o <LI> is different from <li>

• Every XML start tag must have a matching end tag

• If a tag has no end-tag, it must end in /> o <img src="hamster.jpeg"/>

• XML attribute values must be enclosed in quotes o <img src="hamster.jpeg" width="400" height="300"/>

Structure of an XML Document • An XML data set is called a document

• The document starts with a header

<?xml version 1.0?>

• The data are contained in a root element <?xml version 1.0?> <purse>

more data </purse>

• The document contains elements and text

Structure of an XML Document • An XML element has one of two forms

<elementTag optional attributes> contents </elementTag> or <elementTag optional attributes/>

• The contents can be elements or text or both

• An example of an element with both elements and text (mixed content):

<p>Use XML for <strong>robust</strong> data formats.</p>

• Avoid mixed content for data descriptions

Structure of an XML Document • An element can have attributes

• The a element in HTML has an href attribute

<a href=""> ... </a>

• An attribute has a name (such as href) and a value

• The attribute value is enclosed in either single or double quotes

• Attribute is intended to provide information about the content

<value currency="USD">0.5</value> or

<value currency="EUR">0.5</value>

• An element can have multiple attributes

Parsing XML Documents

• A parser is a program that o Reads a document o Checks whether it is syntactically cornet o Takes some action as it processes the document

• There are two kinds of XML parsers o SAX (Simple Access to XML) o DOM ( Document Object Model)

Parsing XML Documents • SAX parser

o Event-driven o It calls a method you provide to process each construct it encounters o More efficient for handling large XML documents

• DOM parser o Builds a tree that represents the document o When the parser is done, you can analyze the tree o Easier to use for most applications

JAXP • Stands for Java API for XML Processing

• Provides a standard mechanism for DOM parsers to read and create documents

• Part of Java1.4 and above

• Earlier versions need to download additional libraries

Parsing XML Documents • Document interface describes the tree structure of an XML document

• A DocumentBuilder can generate an object of a class that implements Document interface

• Get a DocumentBuilder by calling the static newInstance method of the DocumentBuilderFactory class

• Call newDocumentBuilder method of the factory to get a DocumentBuilder DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder();

Parsing XML Documents • To read a document from a file String fileName = . . . ; File f = new File(filename);

Document doc = builder.parse(f);

• To read a document from a URL on the Internet String urlName = . . . ; URL u = new URL(urlName); Document doc = builder.parse(u);

• To read from an input stream InputStream in = . . . ; Document doc = builder.parse(in);

Parsing XML Documents

• You can inspect or modify the document

• The document tree consists of nodes

• Two node type are Element and Text

• Element and Text are subinterfaces of the Node interface

An XML Document <?xml version="1.0"?><items> <item> <product> <description>Ink Jet Refill Kit</description> <price>29.95</price> </product> <quantity>8</quantity> </item> <item> <product> <description>4-port Mini Hub</description> <price>19.95</price> </product> <quantity>4</quantity> </item></items>

Tree View of XML Document

Parsing XML Documents • Start inspection of the tree by getting the root element Element root = doc.getDocumentElement();

• To get the child elements of an element o Use the GetChildNodes method of the Element interface o The nodes are stored in an object of a class that implements the NodeList interface

• Use a NodeList to visit the child nodes of an element o getLength method gives the number of elements o item method gets an item in the node list

• Code to get a child node NodeList nodes = root.getChildNodes(); int i = . . . ; //a value between o and getlength() - 1 Node child = nodes.item(i);

• The XML parser keeps all white spaces if you don't use a DTD o You can include a test to ignore the white space

Parsing XML Documents

• Get an element name with the getTagName Element priceElement = . . . ;

String name = priceElement.getTagName();

• To find the value of the currency attribute String attributeValue = priceElement.getAttribute("currency")

• You can also iterate through all attributes o Use a NamedNodeMap o Each attribute is stored in a Node

Parsing XML Documents • Some elements have children that contain text

• Document builder creates nodes of type Text

• If you don't use mixed content elements o Any element containing text has a single Text child node o Use getFirstChild method to get it o Use getData method to read the text

• To determine the price stored in the price element Element priceNode = . . . ; Text priceData = (Text)priceNode.getFirstChild(); String priceString = priceNode.getData(); double price = Double.parseDouble(priceString);

File 001: import;

002: import;

003: import java.util.ArrayList;

004: import javax.xml.parsers.DocumentBuilder;

005: import javax.xml.parsers.DocumentBuilderFactory;

006: import javax.xml.parsers.ParserConfigurationException;

007: import org.w3c.dom.Attr;

008: import org.w3c.dom.Document;

009: import org.w3c.dom.Element;

010: import org.w3c.dom.NamedNodeMap;

011: import org.w3c.dom.Node;

012: import org.w3c.dom.NodeList;

013: import org.w3c.dom.Text;

014: import org.xml.sax.SAXException;


016: /**

017: An XML parser for item lists

018: */

019: public class ItemListParser

020: {

021: /**

022: Constructs a parser that can parse item lists

023: */

024: public ItemListParser()

025: throws ParserConfigurationException

026: {

027: DocumentBuilderFactory factory

028: = DocumentBuilderFactory.newInstance();

029: builder = factory.newDocumentBuilder();

030: }


032: /**

033: Parses an XML file containing an item list

034: @param fileName the name of the file

035: @return an array list containing all items in the XML file

036: */

037: public ArrayList parse(String fileName)

038: throws SAXException, IOException

039: {

040: File f = new File(fileName);

041: Document doc = builder.parse(f);


043: // get the <items> root element


045: Element root = doc.getDocumentElement();

046: return getItems(root);

047: }


049: /**

050: Obtains an array list of items from a DOM element

051: @param e an <items> element

052: @return an array list of all <item> children of e

053: */

054: private static ArrayList getItems(Element e)

055: {

056: ArrayList items = new ArrayList();


058: // get the <item> children


060: NodeList children = e.getChildNodes();

061: for (int i = 0; i < children.getLength(); i++)

062: {

063: Node childNode = children.item(i);

064: if (childNode instanceof Element)

065: {

066: Element childElement = (Element)childNode;

067: if (childElement.getTagName().equals("item"))

068: {

069: Item c = getItem(childElement);

070: items.add(c);

071: }

072: }

073: }

074: return items;

075: }


077: /**

078: Obtains an item from a DOM element

079: @param e an <item> element

080: @return the item described by the given element

081: */

082: private static Item getItem(Element e)

083: {

084: NodeList children = e.getChildNodes();

085: Product p = null;

086: int quantity = 0;

087: for (int j = 0; j < children.getLength(); j++)

088: {

089: Node childNode = children.item(j);

090: if (childNode instanceof Element)

091: {

092: Element childElement = (Element)childNode;

093: String tagName = childElement.getTagName();

094: if (tagName.equals("product"))

095: p = getProduct(childElement);

096: else if (tagName.equals("quantity"))

097: {

098: Text textNode = (Text)childElement.getFirstChild();

099: String data = textNode.getData();

100: quantity = Integer.parseInt(data);

101: }

102: }

103: }

104: return new Item(p, quantity);

105: }


107: /**

108: Obtains a product from a DOM element

109: @param e a <product> element

110: @return the product described by the given element

111: */

112: private static Product getProduct(Element e)

113: {

114: NodeList children = e.getChildNodes();

115: String name = "";

116: double price = 0;

117: for (int j = 0; j < children.getLength(); j++)

118: {

119: Node childNode = children.item(j);

120: if (childNode instanceof Element)

121: {

122: Element childElement = (Element)childNode;

123: String tagName = childElement.getTagName();

124: Text textNode = (Text)childElement.getFirstChild();


126: String data = textNode.getData();

127: if (tagName.equals("description"))

128: name = data;

129: else if (tagName.equals("price"))

130: price = Double.parseDouble(data);

131: }

132: }

133: return new Product(name, price);

134: }


136: private DocumentBuilder builder;

137: }

File ItemListParserTest.java01: import java.util.ArrayList;


03: /**

04: This program parses an XML file containing an item list.

05: It prints out the items that are described in the XML file.

06: */

07: public class ItemListParserTest

08: {

09: public static void main(String[] args) throws Exception

10: {

11: ItemListParser parser = new ItemListParser();

12: ArrayList items = parser.parse("items.xml");

13: for (int i = 0; i < items.size(); i++)

14: {

15: Item anItem = (Item)items.get(i);

16: System.out.println(anItem.format());

17: }

18: }

19: }

Creating XML Documents • We can build a Document object in a Java program

and then save it as an XML document

• We need a DocumentBuilder object to create a new, empty document DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.newDocument(); //empty document

• The Document class has methods to create elements and text nodes

Creating XML Documents • To create an element use createElement method and

pass it a tag

Element itemElement = doc.createElement("item");

• To create a text node, use createTextNode and pass it a string

Text quantityText= doc.createTextNode("8");

• Use setAttribute method to add an attribute to the tag priceElement.setAttribute("currency", "USD");

Creating XML Documents • To construct the tree structure of a document

o start with the root

o add children with appendChild

• To build an XML tree that describes an item

// create elementsElement itemElement = doc.createElement("item");Element productElement = doc.createElement("product");Element descriptionElement = doc.createElement("description");Element priceElement = doc.createElement("price");Element quantityElement = doc.createElement("quantity");Text descriptionText = doc.createTextNode("Ink Jet Refill Kit");Text priceText = doct.createTextNode("29.95");Text quantityText = doc.createTextNode("8");

// add elements to the documentdoc.appendChild(itemElement);itemElement.appendChild(productElement);itemElement.appendChild(quantityElement);productElement.appendChild(descriptionElement);productElement.appendChild(priceElement);descriptionElement.appendChild(descriptionText);priceElement.appendChild(priceText);quantityElement.appendChild(quantityText);

Creating XML Documents • Use a Transformer to write an XML document to a stream

• Create a transformer Transformer t =


• Create a DOMSource from your document

• Create a StreamResult from your output stream

• Call the transform method of your transformer t.transform(new DOMSource(doc),

new StreamResult(System.out));

File 001: import java.util.ArrayList;

002: import javax.xml.parsers.DocumentBuilder;

003: import javax.xml.parsers.DocumentBuilderFactory;

004: import javax.xml.parsers.ParserConfigurationException;

005: import org.w3c.dom.Document;

006: import org.w3c.dom.Element;

007: import org.w3c.dom.Text;


009: /**

010: Builds a DOM document for an array list of items.

011: */

012: public class ItemListBuilder

013: {

014: /**

015: Constructs an item list builder.

016: */

017: public ItemListBuilder()

018: throws ParserConfigurationException

019: {

020: DocumentBuilderFactory factory

021: = DocumentBuilderFactory.newInstance();

022: builder = factory.newDocumentBuilder();

023: }


025: /**

026: Builds a DOM document for an array list of items.

027: @param items the items

028: @return a DOM document describing the items

029: */

030: public Document build(ArrayList items)

031: {

032: doc = builder.newDocument();

033: Element root = createItemList(items);

034: doc.appendChild(root);

035: return doc;

036: }


038: /**

039: Builds a DOM element for an array list of items.

040: @param items the items

041: @return a DOM element describing the items

042: */

043: private Element createItemList(ArrayList items)

044: {

045: Element itemsElement = doc.createElement("items");

046: for (int i = 0; i < items.size(); i++)

047: {

048: Item anItem = (Item)items.get(i);

049: Element itemElement = createItem(anItem);

050: itemsElement.appendChild(itemElement);

051: }

052: return itemsElement;

053: }


055: /**

056: Builds a DOM element for an item.

057: @param anItem the item

058: @return a DOM element describing the item

059: */

060: private Element createItem(Item anItem)

061: {

062: Element itemElement = doc.createElement("item");

063: Element productElement

064: = createProduct(anItem.getProduct());

065: Text quantityText = doc.createTextNode(

066: "" + anItem.getQuantity());

067: Element quantityElement = doc.createElement("quantity");

068: quantityElement.appendChild(quantityText);


070: itemElement.appendChild(productElement);

071: itemElement.appendChild(quantityElement);

072: return itemElement;

073: }


075: /**

076: Builds a DOM element for a product.

077: @param p the product

078: @return a DOM element describing the product

079: */

080: private Element createProduct(Product p)

081: {

082: Text descriptionText

083: = doc.createTextNode(p.getDescription());

084: Text priceText = doc.createTextNode("" + p.getPrice());


086: Element descriptionElement

087: = doc.createElement("description");

088: Element priceElement = doc.createElement("price");


090: descriptionElement.appendChild(descriptionText);

091: priceElement.appendChild(priceText);


093: Element productElement = doc.createElement("product");


095: productElement.appendChild(descriptionElement);

096: productElement.appendChild(priceElement);


098: return productElement;

099: }


101: private DocumentBuilder builder;

102: private Document doc;

103: }

File ItemListBuilderTest.java01: import java.util.ArrayList;

02: import org.w3c.dom.Document;

03: import javax.xml.transform.Transformer;

04: import javax.xml.transform.TransformerFactory;

05: import javax.xml.transform.dom.DOMSource;

06: import;


08: /**

09: This program tests the item list builder. It prints the

10: XML file corresponding to a DOM document containing a list

11: of items.

12: */

13: public class ItemListBuilderTest

14: {

15: public static void main(String[] args) throws Exception

16: {

17: ArrayList items = new ArrayList();

18: items.add(new Item(new Product("Toaster", 29.95), 3));

19: items.add(new Item(new Product("Hair dryer", 24.95), 1));


21: ItemListBuilder builder = new ItemListBuilder();

22: Document doc =;

23: Transformer t = TransformerFactory

24: .newInstance().newTransformer();

25: t.transform(new DOMSource(doc),

26: new StreamResult(System.out));

27: }

28: }

Document Type Definitions • A DTD is a set of rules for correctly formed documents of a particular type

o Describes the legal attributes for each element type

o Describes the legal child elements for each element type

• Legal child elements are described with an ELEMENT rule

<!ELEMENT items (item*)>

• The items element (the root in this case) can have 0 or more item elements

• Definition of an item node

<!ELEMENT item (product, quantity)>

• Children of the item node must be a product node followed by a quantity


Document Type Definitions • Definition of product node

<! ELEMENT product (description, price)>

• The other nodes

<!ELEMENT quantity (#PCDATA)><!ELEMENT description (#PCDATA)><!ELEMENT price (#PCDATA)>

• #PCDATA stands for parsable character data which is just text

o Can contain any characters

o Special characters have to be encoded when they occur in character data

Encodings for Special Characters

DTD for Item List

<!ELEMENT items (item)*>

<!ELEMENT item (product, quantity)>

<!ELEMENT product (description, price)>

<!ELEMENT quantity (#PCDATA)>

<!ELEMENT description (#PCDATA)>

<!ELEMENT price (#PCDATA)>

Regular Expressions for Element Content

Document Type Definitions

• A DTD gives you control over the allowed attributes of an element <!ATTLIST Element Attribute Type Default>

• Type can be any sequence of character data specified as CDATA

• Type can also specify a finite number of choices <!ATTLIST price currency (USD | EUR | JPY ) #REQUIRED >

Common Attribute Types

Attribute Defaults

Document Type Definitions

• #IMPLIED keyword means you can supply an attribute or not.

<!ATTLIST price currency CDATA #IMPLIED >

• If you omit the attribute, the application processing the XML data implicitly assumes some default value

• You can specify a default to be used if the attribute is not specified

<!ATTLIST price currency CDATA "USD" >

Parsing with Document Type Definitions

• Specify a DTD with every XML document

• Instruct the parser to check that the document follows the rules of the DTD

• Then the parser can be more intelligent about parsing

• If the parser knows that the children of an element are elements, it can suppress white spaces

Parsing with Document Type Definitions

• An XML document can reference a DTD in one of two ways

• The document may contain the DTD

• The document may refer to a DTD stored elsewhere

• A DTD is introduced with a DOCTYPE declaration

Parsing with Document Type Definitions

• If the document contains the DTD, the declaration looks like this: <!DOCTYPE rootElement [ rules ]>

• Example <?xml version="1.0"?><!DOCTYPE items [<!ELEMENT items (item*)><!ELEMENT item (product, quantity)><!ELEMENT product (description, price)><!ELEMENT quantity (#PCDATA)><!ELEMENT description (#PCDATA)><!ELEMENT price (#PCDATA)>]>

<items> <item> <product> <description>Ink Jet Refill Kit</description> <price>29.95</price> </product> <quantity>8</quantity> </item> <item> <product> <description>4-port Mini Hub</description> <price>19.95</price> </product> <quantity>4</quantity> </item></items>

Parsing with Document Type Definitions

• If the DTD is stored outside the document, use the SYSTEM keyword inside the DOCTYPE declaration

• This indicates that the system must locate the DTD

• The location of the DTD follows the SYSTEM keyword

• A DOCTYPE declaration can point to a local file <!DOCTYPE items SYSTEM "items.dtd" >

• A DOCTYPE declaration can point to a URL <!DOCTYPE items SYSTEM "">

Parsing with Document Type Definitions

• When your XML document has a DTD, use validation when parsing

• Then the parser will check that all child elements and attributes conformto the ELEMENT and ATTRIBUTE rules in the DTD

• The parser throws an exception if the document is invalid

• Use the setValidating method of the DocumentBuilderFactorybefore calling newDocumentBuilder method

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

factory.setValidating(true); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(. . .);

Parsing with Document Type Definitions

• If the parser validates the document with a DTD, you can avoid validity checks in your code

• You can tell the parser to ignore white space in non-text elements factory.setValidating(true); factory.setIgnoringElementContentWhitespace(true);

• If the parser has access to a DTD, it can fill in defaults for attributes

File 001: import;

002: import;

003: import java.util.ArrayList;

004: import javax.xml.parsers.DocumentBuilder;

005: import javax.xml.parsers.DocumentBuilderFactory;

006: import javax.xml.parsers.ParserConfigurationException;

007: import org.w3c.dom.Attr;

008: import org.w3c.dom.Document;

009: import org.w3c.dom.Element;

010: import org.w3c.dom.NamedNodeMap;

011: import org.w3c.dom.Node;

012: import org.w3c.dom.NodeList;

013: import org.w3c.dom.Text;

014: import org.xml.sax.SAXException;


016: /**

017: An XML parser for item lists

018: */

019: public class ItemListParser

020: {

021: /**

022: Constructs a parser that can parse item lists

023: */

024: public ItemListParser()

025: throws ParserConfigurationException

026: {

027: DocumentBuilderFactory factory

028: = DocumentBuilderFactory.newInstance();

029: factory.setValidating(true);

030: factory.setIgnoringElementContentWhitespace(true);

031: builder = factory.newDocumentBuilder();

032: }


034: /**

035: Parses an XML file containing an item list

036: @param fileName the name of the file

037: @return an array list containing all items in the XML file

038: */

039: public ArrayList parse(String fileName)

040: throws SAXException, IOException

041: {

042: File f = new File(fileName);

043: Document doc = builder.parse(f);


045: // get the <items> root element


047: Element root = doc.getDocumentElement();

048: return getItems(root);

049: }


051: /**

052: Obtains an array list of items from a DOM element

053: @param e an <items> element

054: @return an array list of all <item> children of e

055: */

056: private static ArrayList getItems(Element e)

057: {

058: ArrayList items = new ArrayList();


060: // get the <item> children


062: NodeList children = e.getChildNodes();

063: for (int i = 0; i < children.getLength(); i++)

064: {

065: Element childElement = (Element)children.item(i);

066: Item c = getItem(childElement);

067: items.add(c);

068: }

069: return items;

070: }


072: /**

073: Obtains an item from a DOM element

074: @param e an <item> element

075: @return the item described by the given element

076: */

077: private static Item getItem(Element e)

078: {

079: NodeList children = e.getChildNodes();


081: Product p = getProduct((Element)children.item(0));


083: Element quantityElement = (Element)children.item(1);

084: Text quantityText

085: = (Text)quantityElement.getFirstChild();

086: int quantity = Integer.parseInt(quantityText.getData());


088: return new Item(p, quantity);

089: }


091: /**

092: Obtains a product from a DOM element

093: @param e a <product> element

094: @return the product described by the given element

095: */

096: private static Product getProduct(Element e)

097: {

098: NodeList children = e.getChildNodes();


100: Element descriptionElement = (Element)children.item(1);

101: Text descriptionText

102: = (Text)descriptionElement.getFirstChild();

103: String description = descriptionText.getData();


105: Element priceElement = (Element)children.item(1);

106: Text priceText

107: = (Text)priceElement.getFirstChild();

108: double price = Double.parseDouble(priceText.getData());


110: return new Product(description, price);

111: }


113: private DocumentBuilder builder;

114: }

File ItemListParserTest.java01: import java.util.ArrayList;


03: /**

04: This program parses an XML file containing an item list.

05: The XML file should reference the items.dtd

06: */

07: public class ItemListParserTest

08: {

09: public static void main(String[] args) throws Exception

10: {

11: ItemListParser parser = new ItemListParser();

12: ArrayList items = parser.parse("items.xml");

13: for (int i = 0; i < items.size(); i++)

14: {

15: Item anItem = (Item)items.get(i);

16: System.out.println(anItem.format());

17: }

18: }

19: }