xml – extensible markup language. objectives to understand various ways in which xml can be used...

XML – Extensible Markup Language

Objectives

To understand various ways in which XML can be used History of XML Syntax of XML Difference between HTML, XML and XHTML XML Document Type Definitions (DTDs) XML Schemas To understand types of XML Parsers

Validating vs. Non-Validating Parsers To understand different XML Parser Interfaces

Tree Based Interface Standard : DOM Event Based Interface Standard : SAX

Evaluating Parsers Which parser to use?

History of XML The World Wide Web Consortium (W3C) is an international consortium where

Member organizations, a full-time staff, and the public work together to develop Web standards

Tim Berners-Lee and others created W3C (1994) Berners-Lee, who invented the World Wide Web in 1989.

• In 1970 IBM Introduced SGML

• SGML: Standard Generalized Markup Language

• SGML is a semantic and structural language for text

documents.

• SGML is complicated.

• XML Working Group is formed under W3C in 1996.

• In 1998 W3C introduced XML 1.0

• Extensible Markup Language (XML) is a subset of SGML

What is XML?

XML stands for eXtensible Markup Language XML is a universal method representing data

Used in applications, web and for data exchange XML is a markup language much like HTML, but used

for different purposes XML is not a replacement for HTML

What is XML? XML was designed to describe data XML is a cross-platform, software and hardware

independent tool for transmitting or exchanging information.

XML is an open-standards-based technology Extensible Both Human and machine readable XML Standard

XML 1.0 (1998). XML 1.1 (Feb 2004)

What Exactly is XML used for?

Storing data in a structured manner. ( Tree

structure)

Storing configuration information – typically

data in an application which is not stored in

a database Most server software have configuration files in

XML formats

Contd…

Transmitting data between applications

Overcomes Problems in Client Server applications which are

cross-platform in nature

Ex: A Windows program talking to a mainframe

XML is a universal, standardized language used to represent

data such that it can be both processed independently and

exchanged between programs and applications and between

clients and servers

Disparate systems can exchange information in a common

format

XML Syntax

The syntax rules of XML are very simple and

very strict.

XML tags are not predefined. You must define your

own tags

<college>GCET</college> All XML elements must have a closing tag

<para>This is a paragraph</para>

Contd…

XML tags are case sensitive

<Msg>This is incorrect</msg> Incorrect

<msg>This is correct</msg> Correct

All XML elements must be properly nested

<name>Jill<lname>Jack</name></lname> Incorrect

<name>Jill<lname>Jack</lname></name> Correct

Attribute values must always be quoted

<pen color=red>reynolds</pen> Incorrect

<pen color=“red”>reynolds</pen> Correct

XML Syntax

All XML documents must have a root element

<parent>

<child>

<subchild>.....</subchild>

</child>

</parent>

XML Comments Comments in XML

Comments are similar to HTML

 <?xml version="1.0"?><!–- Customer details --><customer> <name>John</name> <email>[email protected]</email>

</customer>

<?xml version="1.0"?><!–- Customer details --><customer> <name>John</name> <email>[email protected]</email>

</customer>

XML Code<?xml version="1.0"?><customers><customer> <name>John</name> <email>[email protected]</email></customer> <customer> <name>Tom</name>

<email/></customer>

</customers>

<?xml version="1.0"?><customers><customer> <name>John</name> <email>[email protected]</email></customer> <customer> <name>Tom</name>

<email/></customer>

</customers>

cust.xml

Extensibility in XML A typical XML document is made up of tags

enclosing the data; tag names describe the data

Because the language is extensible, you can create tags that are specific to your need

Contd… For example, your document may contain

tags to structure information about employees The tags may include <Name>, <Designation>,and <Address>

Data stored in XML is self-descriptive One can understand the data by just looking at

tag names

XML – Exchanging Info Between Apps Convert information stored in the database

(or any other format) to an XML format Once it is in XML format, other

applications/programs can parse (read) the XML document, which is made up of the initial data

XML parsers are freely available and are part of many new programming languages

Contd…

An Application

An Application

Spreadsheet Package

Spreadsheet Package

CAD Package

CAD Package

StatisticalProcessing

StatisticalProcessing

XMLDatabaseDatabase

ContentContent

StructureStructure

PresentationPresentation

XML DocXML Doc

DTD/XSDDTD/XSD

XSLXSL

XSD - XML Schema Definition

DTD - Document Type Definition.

XSL - Extensible Stylesheet Language.

Document Type Declaration (DTD)

DTD (Document Type Definition) is used to enforce structure requirements for an XML document

Document type declaration contains reference to Document Type Definition (DTD) and tells the parser which DTD to use for validation

xmldtd.xml

Contd…

<?xml version="1.0"?><!DOCTYPE customers [ <!ELEMENT customers (customer)> <!ELEMENT customer (name,email)> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)>]><customers><customer>

<name>John Conlon</name><email>[email protected]</email>

</customer></customers>

<?xml version="1.0"?><!DOCTYPE customers [ <!ELEMENT customers (customer)> <!ELEMENT customer (name,email)> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)>]><customers><customer>

<name>John Conlon</name><email>[email protected]</email>

</customer></customers>

XML Schema

An XML based alternative to DTD

Richer and more useful than DTDs

Written in XML and Simpler than DTDs

Support data type validation (DTD does not

support data type validation)add.xml

<?xml version="1.0"?> <addressBook>

<person> <cname>Harrison Ford</cname>

<email>[email protected]</email> </person>

<person><cname>Julie</cname>

<email>[email protected]</email>

</person> </addressBook>

<?xml version="1.0"?> <addressBook>

<person> <cname>Harrison Ford</cname>

<email>[email protected]</email> </person>

<person><cname>Julie</cname>

<email>[email protected]</email>

</person> </addressBook>

<?xml version="1.0"?><xs:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema>

<xs:complexType name="record"> <xs:sequence> <xs:element name="cname" type="xs:string"/>

<xs:element name="email" type="xs:string/>

</xs:sequence> </xs:complexType> <xs:element name="addressBook"> <xs:complexType> <xs:sequence> <xs:element name="person" type="record" minOccurs="0" maxOccurs="unbounded"/>

</xs:sequence> </xs:complexType> </xs:element> </xs:schema>

<?xml version="1.0"?><xs:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema>

<xs:complexType name="record"> <xs:sequence> <xs:element name="cname" type="xs:string"/>

<xs:element name="email" type="xs:string/>

</xs:sequence> </xs:complexType> <xs:element name="addressBook"> <xs:complexType> <xs:sequence> <xs:element name="person" type="record" minOccurs="0" maxOccurs="unbounded"/>

</xs:sequence> </xs:complexType> </xs:element> </xs:schema>

Simple XML Elements with Pre-defined Data Types

Simple XML Element: An XML element that has no

child elements and attributes. Simple XML elements can

be defined in XSD with the following statement:

<xsd:element name="element_name"

type="xsd:type_name"/>

XSD Syntax

Contd…

where "element_name" is the name of the XML element,

and "type_name" is one of the data type names pre-

defined in XSD.

XSD pre-defined data types are divided into 7 groups: Numeric data types Date and time data types String data types Binary data types Boolean data type

XSD Syntax

Simple XML Elements with Extended Data

Types

Simple XML Element: An XML element that has

no child elements and attributes. Simple XML

elements can be defined by using the pre-defined

XSD data types.

They can also be defined by using extended data

types, which are defined by "simpleType" statements: <xsd:simpleType name="my_type_name"> <xsd:restriction base="xsd:type_name"> XSD facet statements </xsd:restriction> </xsd:simpleType> <xsd:element name="element_name" type="my_type_name"/> where "element_name" is the name of the XML element,

"xsd:type_name" is a pre-defined data type serving as the base data type, and "my_type_name" is the new data type extended from the base data type.

Complex XML Elements

Complex XML Element: An XML element that has at least one

child element or at least one attribute. Complex XML elements

must be defined with complex data types, which are defined by

"complexType" statements:

XSD Syntax

<xsd:element name="element_name" type="my_type_name"/> <xsd:complexType name="my_data_type"> <xsd:sequence> <xsd:element name="child_element_1" type="data_type_1"/> <xsd:element name="child_element_2" type="data_type_2"/> ... </xsd:sequence> <xsd:attribute name="attribute_a" type="data_type_a"/> <xsd:attribute name="attribute_b" type="data_type_b"/> ... </xsd:complexType> where "attribute" statement is used to define an attribute, and "sequence"

statement is used to define the group of child elements, and the order the child elements should appear in the XML structure.

Note that "attribute" statements must appear after the child element definition statements.

XSD Syntax

Empty XML Elements Empty XML Element: A special complex XML element

that has one attribute or more and no child text nodes. Empty XML elements must be defined with complex data types in the following format:

<xsd:complexType name="my_data_type">

<xsd:attribute name="attribute_a" type="data_type_a"/>

<xsd:attribute name="attribute_b" type="data_type_b"/>

...

</xsd:complexType>

XSD Syntax

Anomymous Data Types

If data type is specific to a child element in a parent data type,

and there is not need to share it with data types outside the

parent data type, you can define it as anonymous data type - a

non-named data type defined inline. For example, the following

code:

<xsd:complexType name="my_data_type">

<xsd:sequence> <xsd:element name="setting">

<xsd:complexType> <xsd:sequence>

<xsd:element name="property" type="xsd:string"/>

<xsd:element name="value" type="xsd:integer"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

</xsd:sequence>

</xsd:complexType>

defines "my_data_type" which has a "setting" element,

which has an anonymous data type defined inline.

Well-formed XML Documents

A document is made of elements; There is exactly one element, called the root, or document element

For all other elements, the elements, delimited by start- and end-tags, nest properly within each other

Attributes if any, should have their values enclosed within quotes

Valid XML Documents An XML document is valid if it has an

associated DTD or Schema and if the document complies with the constraints expressed in it

If an XML document is valid, it is also well-formed

Document Type Definitions (DTDs) Describes syntax that explains

which elements may appear in the XML document what are the element contents and attributes

Need for DTD Validating parser ( a program) can be used to check whether

XML data adheres to the rules in DTD The parser can do appropriate error handling if there are any

violation Validity error is not necessary a fatal error, but some

applications may treat it as fatal error

Document Type Declarations A valid XML document must include the

reference to DTD which validates it Types of DTD

Internal DTD: DTD can be embedded into XML document

External DTD: DTD can be in a separate file

Internal DTD DTD embedded in the XML document

The declarations appear between [ and ] E.g. AddressBook.xml

AddressBook.xml

<?xml version='1.0' encoding='utf-8'?><!DOCTYPE AddressBook [

<!ELEMENT AddressBook (Address+)><!ELEMENT Address (Name, Street, City)><!ELEMENT Name (#PCDATA)><!ATTLIST Name salutation CDATA #REQUIRED><!ELEMENT Street (#PCDATA)><!ELEMENT City (#PCDATA)>

]><AddressBook>

<Address><Name salutation="Mr.">Ram</Name><Street>M G Road</Street><City>Bangalore</City>

</Address></AddressBook>

External DTD

DTD is present in separate file Example

The DTD for AddressBook.xml is contained in a file AddressBook.dtd

AddressBook.xml contains only XML Data with a reference to the DTD file

AddressBook.xml

AddressBook.dtdAddressBook.xml

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE AddressBook SYSTEM "file:///c:/XML/AddressBook.dtd">

<AddressBook><Address>

<Name salutation="Mr.">Ram</Name><Street>M G Road</Street><City>Bangalore</City>

</Address></AddressBook>

Anatomy of DTD – Defining new XML tags (Elements)

<!ELEMENT element_name content_specification> element_name: Specifies name of the XML tag Content_specification: Specifies what are the contents of the

element #PCDATA: Parsed character data (Extra white spaces are

ignored) #CDATA: Character data (White spaces retained as is) Nested elements Empty Any (generally avoided but used in mixed content model)

Example:

<!ELEMENT Street (#PCDATA)>

element Street contains the parsed character Data

<!ELEMENT Address (Name, Street, City)>

element Address contains three nested tags Name, Street and City

respectively

<!ELEMENT AddressBook (Address+)>

Element AddressBook contains one or more occurrences of element

Address

Anatomy of DTD – Dealing with multiple children

To declare the children of an element we use syntax similar to regular expression in Perl. To define the children of an element we use the following syntax: (Assume a and b are child elements of the element being declared)

A+ -One or more occurrences of a

A* - Zero or more occurrences of a

A?-a or nothing

A, B – A followed by B

A|B – a or b, but not both

(expression) – Surrounding an expression with parentheses means that it is treated as a unit and may have the suffix operator ?,*or +

Anatomy of DTD – Attribute Declarations Specifies allowable attributes of each

element <!ATTLIST Tag-name Attr-Name Attr-Type Restriction> Tag-name : Element name Attr-Name : Name of the attribute, the

attribute is defined for element Tag-Name

Restriction: Value : Shows a simple text value enclosed in quotes #IMPLIED:Indicates that there is no default value for

this attribute, and this attribute need not be used #REQUIRED:Indicates that there is no default value for

this attribute, but that a value must be assigned to this attribute

#FIXED Value: In this case, Value is the attribute’s value, and the attribute must always have this value

Anatomy of DTD – Attribute Declarations Example

<!ATTLIST Name salutation CDATA #REQUIRED>

The element Name has attribute salutation which is of type CDATA

The attribute salutation must be specified in the Name tag

Anatomy of DTD – Entity Declarations (1 of 2)

Way to escape special characters

Some special characters such as <, >, & are not used

as #PCDATA

This escaping of the characters is called as “Entity

reference”

Following different entity references are used in the

XML document Built-in Entities : &, <, >, ', "

Characters Entities : ó representing ó

Example <State>Jammu & Kashmir</State>

Anatomy of DTD – Entity Declarations(2 of 2) Data that is frequently used can be

declared as an General Entity <!ENTITY entity_name entity_contents>

entity_name : Name of the new Entity

entity_contents : Contents of the new entity

Example <!ENTITY MyCountry "India">

Defines the entity called as MyCountry “India” is the contents of entity MyCountry

Usage in the XML Document <Country>&MyCountry;</Country>

XML Schema

What is XML Schema?

An XML vocabulary for expressing your data's structure and

business rules

Validating parsers can use Schema to check whether XML

data adheres to rules in schema

More robust and extensive than DTD, can do even data type

validations

E.g. : Consider following XML Document<Result><EmpNo>45609</EmpNo><Name>Kiran</Name><Subject>

<Name>IWT</Name><Marks>80</Marks><Grade>A</Grade>

</Subject></Result>

Is this data valid?

To be valid, it must meet following business rules (constraints)

The Result must be comprised of a Subject, Marks, Grade in

the order shown

The Subject must be any valid subject from the list (DC, IWT,

Cryptography)

The Marks must be between 0 to 100 only and Grade can be

either A or B or C

How can XML schema help to accomplish this?

Answer It creates XML vocabulary : Defines following set of elements

<Result>, <Subject>, <Marks>, <Grade> It specifies the contents of each element and restrictions on each

element <Result> element must contain <Subject>, <Marks>, <Grade> in that order

<Subject> must be one of the valid subjects (IWT, Cryptography, DC)

The Marks must be between 0 to 100 only Grade can be either A or B or C

XML Schema specifies in which namespace the created vocabulary must be in

It is not an actual URL, but uses URL syntax and should be a unique string

Example: http://www.Results.com Namespace defines the following vocabulary

Example of referring to Schema

<?xml version = "1.0" encoding = "UTF-8"?><res:Result xmlns:res="http://www.Results.com"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.Results.com Result.xsd">

<res:Name>Kiran</res:Name><res:EmpNo>45609</res:EmpNo><res:Subject>

<res:Name>IWT</res:Name><res:Marks>80.70</res:Marks><res:Grade>A</res:Grade>

</res:Subject><res:Subject>

<res:Name>PF</res:Name><res:Marks>78.30</res:Marks><res:Grade>B+</res:Grade>

</res:Subject></res:Result>

Result.xml

Schema example : Result.xsd<?xml version="1.0" encoding="UTF-8"?><xsd:schema

xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.Results.com"

xmlns="http://www.Results.com" elementFormDefault="qualified">

<xsd:element name="Result"> <xsd:complexType> <xsd:sequence> <xsd:element name="Name"

type="xsd:string"/> <xsd:element name="EmpNo"

type="xsd:int"/> <xsd:element name="Subject"

type="SubjectType" maxOccurs="5"/> </xsd:sequence> </xsd:complexType></xsd:element> <xsd:simpleType name="NameType"> <xsd:restriction base="xsd:string"> <xsd:pattern value="CHSSC|PF|

RDBMS|IWT|AOA"/> </xsd:restriction></xsd:simpleType>

Result.xsd

Schema example : Result.xsd<xsd:complexType name="SubjectType"><xsd:sequence>

<xsd:element name="Name" type="NameType"/>  <xsd:element ref="Marks"/> <xsd:element name="Grade"> <xsd:simpleType>

<xsd:restriction base="xsd:string"><xsd:pattern value="A|B+|B|C|D"/>

</xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence></xsd:complexType><xsd:element name="Marks"><xsd:simpleType><xsd:restriction base="xsd:float">

<xsd:minInclusive value="0.0"/> <xsd:maxInclusive value="100.0"/>

</xsd:restriction></xsd:simpleType></xsd:element></xsd:schema>

DTD vs Schema XML document and DTD use different syntax : Inconsistency

Schema uses XML syntax Limited data type capability

DTDs support a very limited capability for specifying data types. DTDs do not support field level validations and complex types

E.g. : You can't, express "I want the <Marks> element to hold an integer with a range of 0 to 100“ in DTD

Schema describes a set of data types compatible with those found in databases E.g.: Database supports integer, string, etc data types Schema supports integer, string etc while the DTD does not

Element Declarations: Simple Element

Syntax : <xsd:element name=“Element_name” type=“Element_type” Occurrence/>

Element_name : Any valid xml name Element_type : Built in Simple type Occurrence : Number of occurrences of that element, optional

Example : <xsd:element name="Name" type="xsd:string"/>

Defines the element Name of type string <xsd:element name=“Marks" type=“xsd:float“ maxOccurs=“5”/>

Defines the element Marks of simple type float

Marks may appear for maximum 5 times

And by default for minimum 1 time

Element Declarations

Syntax : <xsd:element name=“Element_name”>

<xsd:complexType>

</xsd:complexType></xsd:element>

Example<xsd:element name=“Subject"> <xsd:complexType> <xsd:sequence> <xsd:element name=“Name" type="xsd:string"/>

<xsd:element name=“Marks" type="xsd:float"/> <xsd:element name=“Grade" type="xsd:string"/>

</xsd:sequence> </xsd:complexType><xsd:element> Defines non reusable complex element called ‘Subject’ Each element appears in that sequence because <xsd:sequence> tag is used

Element Declarations: Reusable Simple Type

Element_type_name : Name of the data type Base_data_type : Any of the built in simple data type (integer, float etc) Restriction_specification : Specifies restriction on the element if any

<xsd:simpleType name=“Element_type_name"><xsd:restriction

base="Base_Data_type"></xsd:restriction>

</xsd:simpleType>

Example :<xsd:simpleType name=“MarksType">

<xsd:restriction base="xsd:float"> <xsd:minInclusive value=“0.0"/>

<xsd:maxInclusive value=“100.0”/>

</xsd:restriction> </xsd:simpleType> Defines the reusable element type MarksType Element defined as MarksType may take minimum value of 0.0

and maximum value 100.0 <xsd:element name=“Marks” type=“MarksType”>

Element Declarations: Reusable Complex Type

Syntax <xsd:complexType name=“Type_name”> Defines the reusable type Type_name

Example<xsd:complexType name=“SubjectType“> <xsd:sequence> <xsd:element name=“Name" type=“xsd:string"/>

<xsd:element name=“Marks" type="xsd:int"/>

<xsd:element name=“Grade" type="xsd:string”/>

</xsd:sequence> </xsd:complexType>

Defines reusable complex element type SubjectType Comprises of following elements in the sequence

specified (<xsd:sequence> tag) Name Marks Grade

This type can be used to define elements in your XML<xsd:element name=“Subject” type=“SubjectType”>

Defining the Attributes

Syntax : <xsd:attribute name=“Attr_Name" type=“Attr_Type"/>

Example

<xsd:attribute name=“Project" type=“xsd:string"/>

All attributes are declared as simple types.

Only complex elements can have attributes

Anatomy of XML Schema : Constraints specification

Controls occurrence of individual element or group of elements

Types of constraints <choice> : allows only one element to appear <sequence> : elements must appear in the same

order as they are declared <all> : elements can occur in any

order and in any combination

<choice> constraint E.g.:

<xsd:choice><xsd:element name=“first”/><xsd:element name=“last”/>

</xsd:choice> Allows either first or last name to be used in the

instance XML Document

<sequence> constraints E.g.:

<xsd:sequence> <xsd:element name="Name" type="xsd:string"/>

<xsd:element name="EmpNo" type=“xsd:int"/> <xsd:element name=“Subject" type="SubjectType" maxOccurs="5"/>

</xsd:sequence> All elements must appear in the defined order only

Anatomy of XML Schema : Constraints specification <all> constraints

E.g. : <xsd:all>

<xsd:element name=“invoice”><xsd:element name=“purchaseOrder”><xsd:element name=“mailingLabel”>

</xsd:all> Any of the elements can either appear or not appear Elements may appear in any order

XML Parsers

XML Parser : The Big Picture

Usage of the XML Parser

XML

Document

XML

Parser

Client

Application

API’s

Parsed Data

XML

DTD / Schema

Why to use Parser? Typically use a pre-built XML parser (e.g. JAXP,

Apache Xerces etc) This enables you to build your application much

more quickly

Need for Parser Defining the Parser’s Responsibilities

Ensure that the document adheres to specific standards Does the document match the DTD or Schema? Is the document well-formed?

Make the document contents available to your application

The parser will parse the XML document, and make this data available to your application

An application using parser can access data in XML by going through the hierarchy or using tag names

Types of XML Parsers Validating Parser

a parser that verifies that the XML document adheres to the DTD or Schema

Non-Validating Parser a parser that does not verify the XML document

against the DTD or Schema Most parsers provide an option to turn validation on or

off All parsers checks the well-formedness of XML

document at all times

XML Parser Interfaces Two types of Interfaces provided by XML Parsers

SAX An Event Based Interface DOM a Tree Based Interface

JAXP “Java API for XML Processing” JAXP is part of JDK Provides parsers which can be used in any Java application

It supports both Tree Based Parser : DOM Event Based Parser : SAX

DOM Parser Tree Based Parser

Definition: Parser reads the XML document, and creates an in-memory “tree” representation of XML Document

For example: Given a sample XML document below

What kind of tree would be produced?

<Result><Name>Kiran</Name><EmpNo>45609</EmpNo><Subject>

<Name>CHSSC</Name> <Marks>80</Marks> <Grade>A</Grade>

</Subject></Result>

In memory tree created by Tree Based Parser Tree represents the hierarchy of XML document

DOM Parser

Result

Name

EmpNo

Kiran

45609

Text Nodes

Element Nodes

DOM Parser Tree based APIs presents a memory model of entire

document to an application once parsing has concluded No need to use extra data-structures to maintain the

information during parsing An application can navigate through the tree to find the

desired pieces of document Document Object Model (DOM) is the standard for

Tree Based parsing of XML document

Document Object Model (DOM) The Document Object Model (DOM) is a set of

interfaces defined by the W3C DOM Working Group DOM is the tree based interface used by the

programmers to manipulate the XML document DOM Parser can be Validating or Non Validating DOM Parser represents the logical Model of the XML

document in the memory All the entity reference are expanded before the DOM

tree was constructed

DOM Structure representing XML

Document

Element Element

Attribute

Element

Text

Comment

Result

Name

SubjectKiran

EmpNo

IWT

Text

45609

XML Document Structure

Document Structure representing Result.xml

Name

Grade

Marks

80.0

A

Document Root

Element Node

Text Node

Document Object Model (DOM) : Overview

The root of the DOM Hierarchy is called as a Document node Example : Result

The Child nodes of the Document node are : Element nodes, Comments nodes etc Example : Name, Subject, EmpNo, etc are all Child

Nodes All the nodes in the XML Document are derived from

interface :

org.w3c.dom.Node

The Big picture : Parsing the XML Document

Document builder factory creates an instance of parser with required characteristics

Whether the parser should be validating parser or not

Whether namespace support required or not, Whether to ignore the white spaces between the

elements or not

Factory hides the implementation details of the parser and gives a standard DOM interface for

parsing XML

(Analogous to JDBC driver)

XMLData

DocumentBuilder

(Parser)

DocumentBuilderFactory

Document Object (DOM)

Object

Object Object

Object Object

DomApp.java : Parsing XML Document using DOM Parserpublic class DomApp { public static void main(String argv[]) { MyErrorHandler hErr;

Document hDocument; DocumentBuilderFactory factory =

DocumentBuilderFactory.newInstance(); factory.setValidating(true); factory.setNamespaceAware(true);

try {hErr = new MyErrorHandler();

DocumentBuilder hBuilder = factory.newDocumentBuilder();

// Set the error handlerhBuilder.setErrorHandler(hErr);

hDocument = hBuilder.parse( new File(“Result.xml”));

} catch (Exception e){

// Handle exception if generated during parsing

} }// End of Function main}

Parsing the XML Document using DOM Parser

Step 1: Get the instance of document-builder factory.

This will be used to produce the DOM-parser (called DocumentBuilder)

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

Step 2: Set the properties of the DOM parser to be produced

a. It should validate the XML Document against the Schema / DTD

b. It should be namespace aware

factory.setValidating(true); factory.setNamespaceAware(true);Step 3 : Obtain the instance of the MyErrorHandler class

This instance handles the error generated during parsing, in application specific way hErr = new MyErrorHandler();

Step 4: Obtain the instance of DOM parser, and register the error handler

This will be used to parse the XML Document and creates the memory based tree representation of the XML DocumentDocumentBuilder hBuilder=factory.newDocumentBuilder();

hBuilder.setErrorHandler(hErr);

Step 5 : Parse the XML Document (Result.xml) using the parser created as above

hDocument = hBuilder.parse( new File(“Result.xml”));

The Node interface is the root of DOM Core class hierarchy

This interface can be used to extract information from any DOM

object without knowing its actual type (e.g. Element node, Text node,

Attr Node etc ) of underlying node

i.e. It is possible to access a document's complete structure and

content using only the methods and properties exposed by the Node

interface

The Class Hierarchy rooted at org.w3c.dom.Node

DOM : Exploring the org.w3c.dom.Node Interface

Node

Element Document

Attr Text Comment

Entity

DOM : Important Methods of Node interface Methods to retrieve the various information from the

XML DOM Tree Node getFirstChild(): Returns the first child of the

current node Node getLastChild(): Returns the last child of the

current node String getNodeName(): The name of this node String getNodeValue(): The value of this node,

depending on its type short getNodeType(): A code representing the type of

the underlying object

Methods to alter the elements of XML DOM Tree

Node insetBefore( Node newChild, Node refChild) Node appendChild (Node newChild) Node removeChild (Node oldChild) Node replaceChild (Node newChild, Node

oldChild )

Using Node InterfaceReslt

Name

SubjectKiran EmpNo

Name45609

Node hLastChild = hNode.getLastChild();

hFirstChild= hFirstChild.getFirstChild();

String sName = hFirstChild.getNodeName()

String sVal = hFirstChild.getNodeValue()

hNode = hDocument.getDocumentElement()

Node hFirstChild= hNode.getFirstChild();

XML Parser Interfaces : Event Based Interface Event Based Interface

Definition : Parser reads the XML document and generates events for each parsing step

Some common parsing events Element start-tag read Element content read Element end- tag read

Example<Result>

<Name>Kiran</Name> <EmpNo>45609</EmpNo> <Subject>

<Name>CHSSC</Name> <Marks>80</Marks> <Grade>A</Grade>

</Subject></Result>

XML Parser Interfaces : Event Generated

startElement : Result startElement : Name contents : Kiran endElement : Name startElement : EmpNo contents : 45609 endElement : EmpNo endElement : Result

XML Parser Interfaces : Event Based Interface For each of these events, your application implements “event

handlers” Each time an event occurs, a different event handler is called Your application intercepts these events, and handles them in any

way you want Application does not wait till the entire document gets parsed Application has to maintain the information from XML document

within local data-structures till it is processed completely Simple API for XML (SAX) is the standard for Event Based parsing

of XML document

SAXApp.java : Parsing XML Document using SAX Parser

public class SAXApp {public static void main(String argv[]) {

//Get the instance of parser event handing class

DefaultHandler handler = new Handler();//Get the instance of SAXParserFactorySAXParserFactory factory =

SAXParserFactory.newInstance();try {

// Set the properties of the parser to be obtained

factory.setValidating(true); factory.setNamespaceAware(true);

// Get the new SAX ParserSAXParser saxParser = factory.newSAXParser();// Parse the file// handler : processes events generated during

parsingsaxParser.parse(new File(“Result.xml”),

handler);}

//Handle any exceptions if generated during parsingcatch (Throwable t) {

t.printStackTrace(); }

} // End of function main}

SAXApp.java : Parsing XML Document using SAX Parserclass Handler extends DefaultHandler{

public void error(SAXParseException e) throws SAXException {System.out.println("Error At Line:”+e.getLineNumber());

System.out.print(“Column: "+e.getColumnNumber());// Print the error messageSystem.out.print(e.getMessage());

}

// Process any fatal errors in the XML documentpublic void fatalError(SAXParseException e) throws SAXException {

System.out.println("Fatal Error At Line:”+e.getLineNumber()); System.out.print(“Column: "+e.getColumnNumber());

// Print the error messageSystem.out.print(e.getMessage());

}} //End Class DefaultHander

Understanding The Simple API for XML (SAX) Step 1: Get the instance of SAXParserFactory

This instance is used to obtain the SAX Parser

SAXParserFactory factory = SAXParserFactory.newInstance();Step 2:Get the instance of the event handler class

This class handles all the events generated by parser DefaultHandler handler = new Handler();

Step 3:Set the properties of the parser to be obtained

a. It should validate the XML Document against the Schema / DTD

b. It should be namespace aware

factory.setValidating(true);

factory.setNamespaceAware(true);Step 4 : Obtain the instance of the SAX Parser using the factory just obtained

SAXParser saxParser = factory.newSAXParser();Step 5: Parse the Result.xml file using the SAX Parser obtained as above

Events generated during parsing will be handled by object handlersaxParser.parse(new File(“Result.xml”), handler);

The Big picture : Paring the XML Document using SAX

XML

Document SAX Parser

SAX Parser

Factory

DefaultHandler/ MyHandler

org.xml.saxContentHander

org.xml.saxErrorHander

org.xml.saxEntityResolver

Parser Events

org.xml.sax class hierarchy

implements

org.xml.sax Interfaces org.xml.sax.DefaultHandler Class

Provides the default implementation of all the events

DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods).

Only the methods which are required are overridden

org.xml.sax.ContentHandler Interface Receive notification of the logical content of a document Defines methods like startDocument(), endDocument(),

startElement(), and endElement() These are invoked when an XML tags arerecognized Also defines methods characters() which are invoked

when the parser encounters the text in an XML element

org.xml.sax Interfaces org.xml.sax.ErrorHandler Interface

Allows SAX application to do customized error handling

The parser will then report all errors and warnings through this interface

Important Methods void error() : receives the notification of

recoverable error void fatalError(): receives the notification of non-

recoverable error void warning(): receives the notification of a

warning

Evaluating Parsers : SAX vs. DOM SAX

Advantage

It is good when serial processing of the document is required

and document is very large

i.e. when the size of the XML document is in terms of GBs.

Disadvantage

Requires internal data structure to maintain the parts of XML

document till the complete processing is not finished, therefore

not suitable for parsing the small XML Documents.

DOM Advantage

Supports DOM Tree Traversing methods Allows modification of XML Document Good when the random access of a document is

required Disadvantage

For large XML documents (size in GBs) requires more memory as compared to memory required to parse XML document using SAX Parser.

xml – extensible markup language. objectives to understand various ways in which xml can be used...

Documents

data xml

xml commentscomments

extensible markup language

xml code john

tagsgcetall xml elements

types of xml parsersvalidating

xml working group

xmla typical xml document