extensible markup language (xml). definition xml is a cross-platform, software and hardware...

101
eXtensible Markup Language (XML)

Upload: philippa-harrell

Post on 05-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

eXtensible Markup Language (XML)

Page 2: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Definition

XML is a cross-platform, software and hardware independent tool for

transmitting information.

“XMl is going to be everywhere” – W3C

Page 3: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML BASICS

Page 4: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Introduction

• XML is a portable, widely supported, open technology for data storage and exchange

Page 5: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

• Developed from SGML (Standard Generalised Markup Language )

•Became a W3C Recommendations in 1998

•A meta-markup language

•Deficiencies of HTML and SGML–Many complex features that are rarely used

•HTML is a markup language, XML is used to define markup languages

Introduction

Page 6: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

•Markup languages defined in XML are known as applications

•XML can be written by hand or generated by computer

–Useful for data exchange

•Foundation for several next-generation web technologies:

–RSS, AJAX, Web Services, etc.

Introduction

Page 7: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

What is XML?

• A specification for creating markup languages to store data and exchange data

• Tags are not predefined (user generated)

Page 8: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Why XML?

• Sample Catalog Entry in HTML

<TITLE> Laptop Computer </TITLE> <BODY> <UL>

<LI> IBM Thinkpad 600E<LI>400 MHz<LI> 64 Mb<LI>8 Gb<LI> 4.1 pounds<LI> $3200

</UL> </BODY>

•How can I parse the content? E.g. price?•Need a more flexible mechanism than HTML to interpret content.

Page 9: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Why XML?

<COMPUTER TYPE=“Laptop”><MANUFACTURER>IBM</MANUFACTURER><LINE> ThinkPad</LINE><MODEL>600E</MODEL><SPECIFICATIONS>

<SPEED UNIT = “MHz”>400</SPEED><MEMORY UNIT=“MB”>64</MEMORY><DISK UNIT=“GB”>8</DISK><WEIGHT UNIT=“POUND”>4.1</WEIGHT><PRICE CURRENCY=“USD”>3200</PRICE>

</SPECIFICATIONS></COMPUTER>

Sample Catalog Entry using XMl

Page 10: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

What does XML do?

• XML is used to structure and describe information.

• Intended to be used with the Internet

• Used to facilitate sharing data between different systems

Page 11: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Smart processing using XMl

• <COMPUTER> and <SPECIFICATIONS> provide logical containers for extracting and manipulating product information as a unit

• – Sort by <MANUFACTURER>, <SPEED>,<WEIGHT>, <PRICE>, etc.• Explicit identification of each part enablesits automated processing

– Convert <PRICE> from “USD” to Euro, Yen,etc.

Page 12: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Document exchange

• Use of XMl allows companies to exchange information that can be processed automatically without human intervention e.g.– Purchase orders– Invoices– Catalogues etc

Page 13: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Difference between HTML and XML?

• XML was designed to carry data.

• XML is not a replacement for HTML.

• XML and HTML were designed with different goals:•XML was designed to describe data and to focus on what data is.•HTML was designed to display data and to focus on how data looks.•HTML is about displaying information, while XML is about describing information.

Page 14: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

How can XML be Used?

• XML can Separate Data from HTML• With XML, your data is stored outside your HTML• XML is used to Exchange Data• With XML, data can be exchanged between

incompatible systems• With XML, financial information can be exchanged over

the Internet• XML can be used to Share Data• XML can be used to Store Data• XML can make your Data more Useful• XML can be used to Create new Languages

Page 15: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML Pros and Cons

•pro: –human-readable, self-documenting format

–strict syntax allows standardized tools

–international, platform-independent

–can represent almost any general kind of data (record, list, tree)

•con:

–bulky syntax/structure makes files large; can decrease performance

Page 16: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

An example XML file

Page 17: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Components of an XML Document

Processing Instructions

Elements

Elements with Attributes

<?xml version="1.0" ?><?xml-stylesheet type="text/xsl" href="template.xsl"?><ROOT>

<ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1><ELEMENT2> </ELEMENT2><ELEMENT3 type='string'> </ELEMENT3><ELEMENT4 type='integer' value='9.3'> </ELEMENT4>

</ROOT>

Page 18: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML Document Structure

•A header, then a single document tag that can contain other tags

–<?xml version="1.0" encoding="UTF-8"?>

•Tag syntax:

–<element attributes> text or tags </element>

•Attribute syntax:

–name="value"

•comments: <!--comment -->

Page 19: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML Document Structure

•Well-formed XML documents–All XML documents begin with an XML declaration.–All begin tags have a matching end tag

•Empty tags–There is one root tag that contains all the other

tags in a document–Attributes must have a value assigned, the value

must be quoted–The characters <, >, & can only appear with their

special meaning–XML tags are case sensitive–XML elements must be properly nested

Page 20: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML Document Structure

•Specified by either DTD or XML schemas. (specify the tags and their order)

•Valid documents are well-formed and also conform to a schema which defines details of the allowed content

•Validity is tested against a schema, discussed later

Page 21: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML SCHEMA

Page 22: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML Schema

•Schema: describes the structure of an XML document, by setting rules specifying which tags and attributes are valid, and how they can be used together

•Used to check XML files to make sure they follow the rules set in the schema; W3C validator uses it to validate doctypeat top of XHTML file specifies schema

•Two ways to define a schema: –Document Type Definition (DTD)–W3C XML Schema

Page 23: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Document Type Definitions

•A set of structural rules called declarations

•Define tags, attributes, entities

•Specify the order and nesting of tags

•Specify which attributes can be used with which tags.

•DTD can be:–Embedded inside XML (internal DTD).–Stored in a separate file (external DTD saved as .dtd).

Page 24: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Document Type Definitions

General syntax for declarations –<!keyword …. >–Note, not XML!

•Four possible keywords:–ELEMENT, for defining tags

–ATTLIST, for specifying attributes in your tags

–ENTITY, for identifying sources of data

–NOTATION, for defining data types for non-XML data

Page 25: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Declaring Elements

•General syntax-<!ELEMENT element-name(content-description)>- Content description specifies what tags may appear inside the named element and whether there may be any plain text in the content

EX: <!ELEMENT person (parent+, age, spouse?, sibling*)>•An element can be either an internal or a leaf node.•Multiplicity

–+–*–?

•Leaf elements can be:–#PCDATA –EMPTY–ANY

Page 26: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Declaring Attributes

•General syntax–<!ATTLIST element-name

(attribute-name attribute-type default-value?)+ >

•There are 10 attribute types, only CDATA will be used.•Default values

–A value–#FIXED value–#REQUIRED (no default value, each instance must specify value)–#IMPLIED (default, if not specified)

Page 27: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Declaring Attributes

Example:

•DTD<!ATTLIST airplane places CDATA “4”><!ATTLIST airplane engine_type CDATA #REQUIRED><!ATTLIST airplane price CDATA #IMPLIED><!ATTLIST airplane make CDATA #FIXED “Cessna”>

•XML Element

<airplane places=“10” engine_type=“jet”></airplane>

Page 28: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

ENTITIES

XML document may be distributed among a number of files

–Each unit of information is called an entity–Each entity has a name to identify it–Defined using an entity declaration –Used by calling an entity reference

Page 29: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

General Entities

•Declared in DTD

–<!DOCTYPE My_XML_Doc [<!ENTITY name "replacement"> ]>

•<!ENTITY xml "eXtensible Markup Language">

•The &xml; includes entities

•The eXtensible Markup Language includes entities

Page 30: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Attributesvs. Elements

•There are no rules about when to use attributes and when to use elements.

•Avoid XML Attributes?

•Some of the problems with using attributes are:

–attributes cannot contain multiple values (elements can) –attributes cannot contain tree structures (elements can) –attributes are not easily expandable (for future changes)

Page 31: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Internal and External DTDs

A document type declaration can either contain declarations directly or can refer to another file

•Internal

–<!DOCTYPE root-element[declarations]>

•External file

–<!DOCTYPE root-nameSYSTEM “file-name”>

Page 32: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

DTD Example

<?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body></note>

Syntax: <!Doctype root-element SYSTEM “filename”

You usually specify the DTD for your XML document by providing a reference to it near the top of the XMl document.

Page 33: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

DTD Example

The DTD (note.DTD) for XMl document Note: is

.

<!DOCTYPE note [

<!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]>

Lists the document type (note) and the valid elements, and the type of content they can accept

Page 34: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Why use a DTD?

With DTD, each of your XML files can carry a description of its own format with it.

With a DTD, independent groups of people can agree to use a common DTD for interchanging data.

Your application can use a standard DTD to verify that the data you receive from the outside world is valid.

Page 35: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

NAMESPACES

Page 36: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

NAMESPACES

•“XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references.”•Multiple namespaces can be used in a single document•Two types of Namespaces:

–Default namespace•<element xmlns=“URI”>

–Explicit•Use a prefix.•<element xmlns[:prefix]=“URI”>•Prefix is used for two reasons:

–URI is too long to be typed–URI includes illegal characters

Page 37: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

NAMESPACES

•Example:<states xmlns= “http://www.states.org/states”xmlns:cap= “http://www.states.org/capitals”><state><name>South Dakota</name><capital><cap:name>South Dakota</cap:name></capital></state></states>

•DTDs do not support namespaces very well

Page 38: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

NAMESPACES

Page 39: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

NAMESPACES

Page 40: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

W3C XML SCHEMA (XSD)

Page 41: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Remember “Valid” XML

•Adheres to specification in DTD or XSD

Page 42: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML Schemas (XSD)

•XSD , can specify elements, attributes, nesting, ordering, #occurrences.

•However DTDs have several deficits

–They do not use XML syntax

–They do not support namespaces

–Data types cannot be strictly specified•Example date vs. string is like DTD

Page 43: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Schema Fundamentals

•Documents that conform to a schema’s rules are considered instances of that schema

•Schema purposes–Structure of instances.–Data types of elements and attributes.

•W3C XML Schemas support namespaces–The XML Schema language itself is a set of XML tags–The application being described is another set of tags

Page 44: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Defining a Schema

•The root of an XML Schema document is the <schema> tag

•Attributes–xmlns attributes for the schema namespace and for the namespace being defined–A targetNamespaceattribute declaring the namespace being defined

–An elementFormDefaultattribute with the value qualified to indicate that all elements defined in the target namespace must be namespace qualified (either with a prefix or default) when used

Page 45: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Defining a Schema

•Example<xsd:schemaxmlns:xsd= “http://www.w3.org/2001/XMLSchema” targetNamespace= “http://www.sustech.edu” xmlns= “http://www.sustech.edu” elementFormDefault= "qualified">...</xsd:schema>

Page 46: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

An Overview of Data Types

•Data types are of two kinds–Simple data types with string content–Complex data types with elements, attributes and string content

•Predefined types–Primitive: string, Boolean, float .. –Derived: PositiveInteger, long, decimal

•Restrictions (user defined types)–Facets(to limit its content, or require the data to match a specific pattern)

Page 47: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Simple Types

•Named types can be used to give the type of –an attribute (which must be simple) or –an element (which may be simple or complex)

•Elements or attributes with simple type may have default values specified•Syntax

–<xsd:elementname="xxx" type="yyy"/>–where xxx is the name of the element and yyy is the data type of the element–E.g.: <xsd:elementname=“engine” type =“xsd:string” default=“fuel inj”/>

Page 48: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Simple Types

•New simple types can be defined by restriction of base types

•Example:<xsd:simpleTypename = “FirstName”><xsd:restrictionbase = “xsd:string”><xsd:maxLengthvalue=“10”></ xsd:restriction></ xsd:simpleType>

Page 49: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Complex Types

•A complex element is an XML element that contains other elements and/or attributes.

•There are four kinds of complex elements:–empty elements –elements that contain only other elements–elements that contain only text –elements that contain both other elements and text

–Check http://www.w3schools.com/schema/schema_complex.asp

Page 50: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Empty Elements

<xs:elementname="product"><xs:complexType><xs:attributename="prodid“ type="xs:positiveInteger"/></xs:complexType></xs:element>

Page 51: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Elements Only

•Element-only element must be contained in an ordered group, an unordered group, a choice or a named group.

•The sequenceelement is used to contain an ordered group.

•The allelement is used to contain an unordered group.

•An element definition can be associated with a type by

–Referring to a named type directly in the type attribute

Page 52: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Elements Only

<xs:elementname="person"> <xs:complexType> <xs:sequence> <xs:elementname="firstname" type="xs:string"/> <xs:elementname="lastname" type="xs:string"/> </xs:sequence> </xs:complexType></xs:element>

Page 53: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XSD (note.xsd)

>

Page 54: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Defining a Schema Instance

•The xmlnsattribute declares a namespace for an element and its descendants

–<element xmlns[:prefix+=“URI”>–The element itself may not be in the namespace

–Multiple elements may be defined

•The http://www.w3.org/2001/XMLSchema-instancenamespace includes one attribute, schemaLocation

Page 55: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Reference to XSD (notes.xml)

Page 56: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Validating Instances of Schemas

eVarious systems for validating instances against schemas

–Online http://www.w3.org/2001/03/webdata/xsv or http://validator.w3.org/

–Standalone automatic validation tools: AltovaXMLSpy, oXygenXML editor, XML Copy ditor, XMLLINT

Page 57: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Example

Page 58: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going
Page 59: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XPATH

Page 60: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

What is Xpath?

•XPathis a syntax used for selecting parts of an XML document

•The way Xpath describes paths to elements is similar to the way an operating system describes paths to files

•Xpath is almost a small programming language; it has functions, tests, and expressions

•Xpath is a W3C standard

•Xpath is not itself written as XML, but is used heavily in XSLT

Page 61: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Xpath capabilities

•Xpath provides a syntax for:

–navigating around an XML document

–selecting nodes and values

–comparing node values

–performing arithmetic on node values

•Xpath= Path expressions + Conditions

•Xpath provides some functions (e.g., concat(), substring(), etc.) to facilitate the above.

Today XPath expressions can also be used in JavaScript, Java, XML Schema, PHP, Python, C and C++, and lots of other languages.

Page 62: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Xpath Basic Constructs

Page 63: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Example

Page 64: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Example

Page 65: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

eXtensibleStyle Language Transformation

Page 66: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Displaying Raw XML Documents

•Plain XML documents are generally displayed literally by browsers•Firefox notes that there is no style information

Page 67: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Displaying Raw XML Documents

• Raw XML files can be viewed in IE 5.0 (and higher) and in Netscape 6– but to make it display like a web page,

you have to add some display information

• XML documents do not carry information about how to display the data

• Different solutions to the display problem, using CSS, XSL, JavaScript, and XML Data Islands

Page 68: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Displaying XML Documents with CSS

• An xml-stylesheet processing instruction can be used to associate a general XML document with a style sheet– <?xml-stylesheet type=“text/css” href=“planes.css”>

• The style sheet selectors will specify tags that appear in a particular document

Page 69: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XSLT Style Sheets

• A family of specifications for transforming XML documents– XSLT: specifies how to transform

documents– XPath: specifies how to select parts of a

document and compute values– XSL-FO: specifies a target XML language

describing the printed page• XSLT describes how to transform XML

documents into other XML documents such as XHTML– XSLT can be used to transform to non-XML

documents as well

Page 70: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Overview of XSLT

• A functional style programming language

• Basic syntax is XML

• An XSLT processor takes an XML document as input and produces output based on the specifications of an XSLT document

Page 71: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XSLT Processing

XSLTDocument

XMLDocument

XSLTProcessor

XSLDocument

Page 72: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Creating XSLT Stylesheets

• Uses the http://www.w3c.org/1999/XSL/Transformnamespace, and is typically assigned XSL prefix

• XSLT Stylesheets are defined using the <xsl:stylesheet> root tag

• Stylesheets typically contain one or more <xsl:template> tags that define each template– Templates have name or/and match attributes

• Templates contain other XSLT tags that control how the XML data is transformed

Page 73: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Common XSLT Elements

• •<xsl:stylesheet>• •<xsl:templatename=“name” match

=“xpath”>• •<xsl:value-of select=“xpath”>• •<xsl:attribute>• •<xsl:text>• •<xsl:for-each select=“xpath”>• •<xsl:iftest=“condition”>• •<xsl:choose>, <xsl:when>, <xsl:otherwise>• •<xsl:sortselect=“xpath”>

Page 74: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

An Example XSLT Template

Page 75: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Example: the xml file<?xml version="1.0" encoding="ISO-8859-1"?><?xml-stylesheet type="text/xsl" href="simple.xsl" ?><breakfast_menu>

<food><name>Belgian Waffles</name><price>$5.95</price><description>two of our famous Belgian Waffles with

plenty of real maple syrup</description><calories>650</calories>

</food><food>

<name>Strawberry Belgian Waffles</name><price>$7.95</price><description>light Belgian waffles covered with

strawberries and whipped cream</description><calories>900</calories>

</food>…

</breakfast_menu>

Page 76: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Example: the xsl file<?xml version="1.0" encoding="ISO-8859-1"?><html xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/TR/xhtml1/strict"> <body style="font-family:Arial,helvetica,sans-serif;font-size:12pt; background-color:#EEEEEE"> <xsl:for-each select="breakfast_menu/food"> <div style="background-color:teal;color:white;padding:4px"> <span style="font-weight:bold;color:white"> <xsl:value-of select="name"/></span> - <xsl:value-of select="price"/> </div> <div style="margin-left:20px;margin-bottom:1em;font-size:10pt"> <xsl:value-of select="description"/> <span style="font-style:italic"> (<xsl:value-of select="calories"/> calories per serving) </span> </div> </xsl:for-each> </body></html>

Page 77: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

• Demo– XSLT EXAMPLE

Page 78: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML PROCESSOR

Page 79: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Uses of XML

• To exchange data between incompatible systems(just send an XML document, with an agreed definition of the tags)

• For B2B e-commerce – exchange of business documents between businesses - XML is flexible enough to describe any logical text structure e.g. Purchase order, invoice

• To store data – as plain text files, or in databases

• To create new mark-up languages (I.e. that uses tags) – Can use XML to agree what the tags mean. Many mark-up languages already created that have been based on XML – e.g. JSTL, WML, VoiceXML, XHTML

Page 80: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Using an XMl document

Need an XML Parser to “use” or parse out the data held in the XMl document

Page 81: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML Processors

• XML processors provide tools in programming languages to read in XML documents, manipulate them and to write them out

Page 82: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Purposes of XML Processors

• Four purposes– Check the basic syntax of the input document– Replace entities– Insert default values specified by schemas or DTD’s– If the parser is able and it is requested, validate the input

document against the specified schemas or DTD’s• The basic structure of XML is simple and repetitive, so

providing library support is reasonable• Examples

– Xerces-J from the Apache foundation provides library support for Java

– Command line utilities are provided for checking well-formedness and validity

• Two different standards/models for processing– Tree based – Document Object Model (DOM)– Event based – Simple API for XMl (SAX)

Page 83: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Parsing

• The process of reading in a document and analyzing its structure is called parsing

• The parser provides as output a structured view of the input document

Page 84: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML Parsers

An XML parser does the following:

• Retrieves and read the an XML document – I.e. “parses” the document to figure out what’s in it,

• Ensures the document adheres to specific standards (e.g. well formed? Adheres to DTD?)

• Makes the document contents available to your application

Page 85: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML Parsers: Tree based DOM interface

• Uses Document Object Model (DOM)

• Tree based interface (navigates through the document)

• Developed by W3C• XML parsers that use DOM exist for

java, javascript, perl, C++

Page 86: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Tree based DOM parser - example

Object/Tree Interface (DOM)

Definition: Parser reads the XML document, and creates an in-memory“tree” of data – an object module of the data

For example: Given a sample XML document on the next slide, what kind of tree would be produced?

Page 87: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Tree based DOM parser - example

Sample XML Document

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE WEATHER SYSTEM "Weather.dtd"> <WEATHER> <CITY NAME="Hong Kong"> <HI>87</HI> <LOW>78</LOW> </CITY></WEATHER>

Page 88: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Tree based DOM parser - example

Page 89: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML Parsers: Event based SAX parser

• Simple API for XML• Event based• Developed by volunteers on the

XML-dev mailing list• http://www.megginson.com/SAX/

Page 90: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Event based SAX parserEvent Based Parser

Definition: Parser reads the XML document, and generates events for each parsing event.

They don’t create an in memory object model of the document – it’s up to the programmer to write the code to interpret the events

For example: Given the same XML document, what kind of events would be produced?

Page 91: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Event based SAX parser: example

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE WEATHER SYSTEM

"Weather.dtd"><WEATHER> <CITY NAME="Hong Kong"> <HI>87</HI> <LOW>78</LOW> </CITY></WEATHER>

Page 92: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Event based SAX parser: example

Events generated:

• 1. Start of <Weather> Element• 2. Start of <CITY> Element• 3. Start of <HI> Element• 4. Character Event: 87• 5. End of </HI> Element• 6. Start of <LOW> Element• 7. Character Event: 78• 8. End of </LOW> Element• 9. End of </CITY> Element• 10. End of </WEATHER> Element

Page 93: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Event based parsers

For each of these events, the your application implements “event handlers.”

Each time an event occurs, a different event handler is called.

Your application intercepts these events, and handles them in any way you want.

Page 94: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Comparing tree based DOM parser with event based SAX

parser

Questions: • Which parser is faster?

• Which parser is more efficient?

• Which parser is suitable for which type of XML documents?

Page 95: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Comparing tree based DOM parser with event based SAX

parserTree based:

slower takes up more memory

Simpler to use

More suitable for documentsthat are less structured, with

less repetition of tags.

More suitable where the program needs to move around the

document alot within the program need to keep easy access to

full document at all time.

Event based: Faster

Takes up much less memoryBut More complex to

implement

Good for large, machine generated, structured documents e.g. book

contents (because repetitive nature of tags allows for re-use of event handling code and therefore less

work for programmer

Good where only parts of the document needed at any one time within the document (event based parsers cannot “skip around” from

one part of the document to the other

Page 96: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Comparing tree based DOM parser with event based SAX

parser

Performance and Memory

Therefore, when high performance and low-memory are the most important criteria, use an event-based parser.

Examples:• Java applets• Palm Pilot Applications• Parsing Huge Data files

Page 97: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

The SAX Approach

• In the SAX approach, an XML document is read in serially

• As certain conditions, called events, are recognized, event handlers are called

• The program using this approach only sees part of the document at a time

Page 98: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

The DOM Approach

• In the DOM approach, the parser produces an in-memory representation of the input document– Because of the well-formedness rules of XML, the

structure is a tree• Advantages over SAX

– Parts of the document can be accessed more than once

– The document can be restructured– Access can be made to any part of the document at

any time– Processing is delayed until the entire document is

checked for proper structure and, perhaps, validity• One major disadvantage is that a very large document

may not fit in memory entirely

Page 99: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

DOM Parser example

Page 100: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

XML IN PRACTICE

•Ajax

•RSS

•KML (Keyhole Markup Language) Google Earth

•ODF (Open Document Format) Open Office

•eBooks and ePub

•Web Services: SOAP & WSDL

Page 101: EXtensible Markup Language (XML). Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going

Web resources

•http://www.deitel.com/ResourceCenters/Programming/XML/tabid/279/Default.aspx •W3 Schools XML Tutorial

•http://www.w3schools.com/xml/default.asp•W3C XML page

•http://www.w3.org/XML/•XML Tutorials

•http://www.programmingtutorials.com/xml.aspx •Online resource for markup language technologies

•http://xml.coverpages.org/•Several Online Presentations