cosc 843: application development for internet based services data description and transformation 1....
TRANSCRIPT
COSC 843: Application Development for Internet Based Services
Data Description and Transformation
1. XML
2. DTD
3. XSD
4. Xpath
5. XSL /XSLT
COSC 843: Application Development for Internet Based Services
1. XML
What is XML?
Why XML?
Brief History and Versions
Sample XML Documents
XML Namespaces
COSC 843: Application Development for Internet Based Services
What is XML?
XML stands for EXtensible Markup Language
A meta-language for descriptive markup: you invent your own tags
XML uses a Document Type Definition (DTD) or an XML Schema to describe the data XML with a DTD or XML Schema is designed to be self-descriptive
Built-in internationalization via Unicode Documents can contain characters from many languages
Built-in error-handling A forgotten tag, or an attribute without quotes renders an XML
document unusable
Tons of support from the big IT companies
COSC 843: Application Development for Internet Based Services
Why XML?
Much of shareable data reside in computer systems and databases in incompatible formats use conflicting hardware and/or software.
One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet
Converting the data to XML can greatly reduce the complexity and create data that can be read by many different applications XML data is stored in plain text format – hardware and software
independent
XML can be used to create new languages Allows us to define our own markup languages
COSC 843: Application Development for Internet Based Services
Brief XML History
SGML (Standard Generalized Markup Language) ISO Standard, 1986, for data storage & exchange Metalanguage for defining languages (through DTDs) A famous SGML language: HTML Separation of content and display Used in U.S. gvt. & contractors, large manufacturing
companies, technical info. Publishers,... SGML reference is 600 pages long
XML W3C recommendation in 1998 Simple subset (80/20 rule) of SGML: “ASCII of the Web”,
“Semantic Web” XML specification is 26 pages long
COSC 843: Application Development for Internet Based Services
… Brief XML History 1986
SGML becomes a standard
1989 Tim Berners-Lee creates the WWW
1994 W3C established
1998 XML 1.0 W3C Recommendation
Jan 2000 XHTML becomes W3C Recommendation A Reformulation of HTML 4 in XML 1.0
Feb 2004 W3c XML 1.0 (Third Edition) Recommendation http://www.w3.org/TR/2004/REC-xml-20040204/
Feb 2004 XML 1.1 Recommendation http://www.w3.org/TR/2004/REC-xml11-20040204/ updates XML to use Unicode 3
COSC 843: Application Development for Internet Based Services
XML and HTML XML is not a replacement for HTML
In future Web development, XML is likely to be used to describe data while HTML will be used to format and display the same data (one interpretation of XML)
XML and HTML were designed with different goals XML was designed to describe data and to focus on what data is
XML describes only content, or “meaning” HTML was designed to display data and to focus on how data
looks. HTML describes both structure (e.g. <p>, <h2>, <em>) and
appearance (e.g. <br>, <font>, <i>)
XML is for computers while HTML is for humans XML is used to mark up data so it can be processed by
computers HTML is used to mark up text so it can be displayed to users
COSC 843: Application Development for Internet Based Services
XML does not DO anything
XML was not designed to DO anything A piece of software must be written to do something (send, receive or
display the document)
The following example is a book info, stored as XML:
<?xml version='1.0'?><bookstore> <book genre='autobiography' publicationdate='1981' ISBN='1-861003-11-0'> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> …</bookstore>
COSC 843: Application Development for Internet Based Services
XML is Free and Extensible
XML tags are not predefined You must "invent" your own tags The tags used to mark up HTML documents and the
structure of HTML documents are predefined The author of HTML documents can only use tags
that are defined in the HTML standard
XML allows the author to define his own tags and his own document structure
COSC 843: Application Development for Internet Based Services
XML Future XML is going to be everywhere
A large number of software vendors adopted the XML standard very quickly
XML is a cross-platform, software and hardware independent tool for transmitting information.
DocumentsConfiguration
Database
Application X
Repository
XML XML
XML XML
COSC 843: Application Development for Internet Based Services
Benefits of XML
Open W3C standard – non-proprietary
Representation of data across heterogeneous environments Cross platform Allows for high degree of interoperability
E.g., ability to exchange data between incompatible applications with incompatible data formats
Strict rules that make it relatively easy to write XML parsers Syntax Structure Case sensitive
XML can make data more useful s/w, h/w and application independence of XML makes data available
to more users not only HTML browsers
COSC 843: Application Development for Internet Based Services
Components of an XML Document
XML declaration
Processing instructions Encoding specification (Unicode by default) Namespace declaration Schema declaration
Elements Each element has a beginning and ending tag
<TAG_NAME>...</TAG_NAME> Elements can be empty (<TAG_NAME />)
Attributes Describes an element; e.g. data type, data range, etc. Can only appear on beginning tag
COSC 843: Application Development for Internet Based Services
Components of an XML Document
Processing Instructions
Elements
Elements with Attributes
<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="template.xsl"?>
<ROOT>
<ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1>
<ELEMENT2> </ELEMENT2>
<ELEMENT3 type='string'> </ELEMENT3>
<ELEMENT4 type='integer' value='9.3'> </ELEMENT4>
</ROOT>
COSC 843: Application Development for Internet Based Services
XML Declaration
The XML declaration looks like this:<?xml version="1.0" encoding="UTF-8" standalone="yes"?> The XML declaration is not required by browsers, but is
required by most XML processors (so include it!) If present, the XML declaration must be first--not even
whitespace should precede it Note that the brackets are <? and ?> The version attribute is required encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode),
or something else, or it can be omitted An XML document is standalone if it makes use of no
external markup (DTD) declarations Default value for this attribute is no
COSC 843: Application Development for Internet Based Services
Processing Instructions
A PI is a command to the program processing the XML document to handle it in a certain way
PIs (Processing Instructions) may occur anywhere in the XML document (but usually first)
XML documents are typically processed by more than one program
Programs that do not recognize a given PI should just ignore it
General format of a PI: <?target instructions?>
Example: <?xml-stylesheet type="text/css" href="mySheet.css"?>
COSC 843: Application Development for Internet Based Services
XML Elements
An XML element is everything from the element's start tag to the element's end tag
XML Elements are extensible and they have relationships Related as parents and children
XML Elements have simple naming rules Names can contain letters, numbers, and other characters Names must not start with a number or punctuation character Names must not start with the letters xml (or XML or Xml ..) Names cannot contain spaces
COSC 843: Application Development for Internet Based Services
XML Attributes
XML elements can have attributes
Data can be stored in child elements or in attributes
Should you avoid using attributes? Here are some of the problems using attributes:
attributes cannot contain multiple values (child elements can) attributes are not easily expandable (for future changes) attributes cannot describe structures (child elements can) attributes are more difficult to manipulate by program code attribute values are not easy to test against a Document Type Definition
(DTD) - which is used to define the legal elements of an XML document
Experience shows that attributes are handy in HTML but child elements should be used in their place in XML Use attributes only to provide information that is not relevant to the
data
COSC 843: Application Development for Internet Based Services
An XML Document
<?xml version='1.0'?><bookstore> <book genre='autobiography' publicationdate='1981' ISBN='1-861003-11-0'> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> <book genre='novel' publicationdate='1967' ISBN='0-201-63361-2'> <title>The Confidence Man</title> <author> <first-name>Herman</first-name> <last-name>Melville</last-name> </author> <price>11.99</price> </book></bookstore>
COSC 843: Application Development for Internet Based Services
Another XML Document
<?xml version="1.0"?><weatherReport> <date>7/14/97</date> <city>North Place</city>, <state>NX</state> <country>USA</country> High Temp: <high scale="F">103</high> Low Temp: <low scale="F">70</low> Morning: <morning>Partly cloudy, Hazy</morning> Afternoon: <afternoon>Sunny & hot</afternoon> Evening: <evening>Clear and Cooler</evening></weatherReport>
COSC 843: Application Development for Internet Based Services
XML Validation
There is a difference between a well-formed XML document and a valid XML document
A well-formed XML document is one with correct XML syntax See next slide for well-formedness rules
XML syntax is constrained by a grammar (DTD or Schema) that governs the permitted tag names, attachment of attributes to tags, and so on.
A well-formed XML document that also conforms to a given DTD or schema is said to be valid. Every valid XML document is well-formed but the reverse is not
necessarily the case
COSC 843: Application Development for Internet Based Services
Rules For Well-Formed XML
There must be one, and only one, root element
All XML elements must have a closing tag
Sub-elements must be properly nested
Attributes are optional Defined by an optional schema
Attribute values must be enclosed in “” or ‘’
Processing instructions are optional
XML is case-sensitive
COSC 843: Application Development for Internet Based Services
XML DTD A DTD defines the legal elements of an XML document
defines the document structure with a list of legal elements
XML Schema XML Schema is an XML based alternative to DTD
Errors in XML documents will stop the XML program The W3C XML specification states that a program should not continue
to process an XML document if it finds a validation error
Processing an XML document requires a software program called an XML Parser (or XML Processor) http://www.xml.com/xml/pub/Guide/xml_parsers
There are two flavors of parsers: Non-validating: checks for a document’s well-formedness (e.g.,
Browsers) Validating: checks for a document’s validity
COSC 843: Application Development for Internet Based Services
Browsers Support for XML
Netscape 6 supports XML
Internet Explorer 5.0 supports the XML 1.0 standard
Internet Explorer 5.0 has the following XML support: Viewing of XML documents Displaying XML with CSS Transforming and displaying XML with XSL XML embedded in HTML as Data Islands Binding XML data to HTML elements Access to the XML DOM Full support for W3C DTD standards
COSC 843: Application Development for Internet Based Services
Viewing XML Documents
Raw XML files can be viewed in IE 5.0 (and higher) and in Netscape 6
XML documents do not carry information about how to display the data To make them display like a web page, you have to add some
display information
Different solutions to the display problem, using CSS, XSL, XML Data Islands, and JavaScript
Will you be writing your future Homepages in XML? Most Microsoft pages are XML based and the server converts
them to HTML on-the-fly when requested
COSC 843: Application Development for Internet Based Services
Displaying XML with CSS
With CSS (Cascading Style Sheets) you can add display information to an XML document
Formatting XML with CSS is NOT the future of the Web
Formatting with XSL will be the new standard
COSC 843: Application Development for Internet Based Services
Example: the xml file
<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/css" href="cd_catalog.css"?> <CATALOG>
<CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE> <YEAR>1985</YEAR>
</CD> <CD>
<TITLE>Hide your heart</TITLE> <ARTIST>Bonnie Tyler</ARTIST>
<COUNTRY>UK</COUNTRY> <COMPANY>CBS Records</COMPANY><PRICE>9.90</PRICE> <YEAR>1988</YEAR> </CD>
. . . . </CATALOG>
COSC 843: Application Development for Internet Based Services
Example: the css file
CATALOG { background-color: white; width: 100%; }
CD { display: block; margin-bottom: 30pt; margin-left: 0; }
TITLE { color: red; font-size: 20pt; }
ARTIST{ color: blue; font-size: 20pt; }
COUNTRY,PRICE,YEAR,COMPANY { display: block; color: black; margin-left: 20pt; }
COSC 843: Application Development for Internet Based Services
Displaying XML with XSL
With XSL you can add display information to your XML document
XSL is the preferred style sheet language of XML XSL (the eXtensible Stylesheet Language) is far
more sophisticated than CSS One way to use XSL is to transform XML into HTML
before it is displayed by the browser
COSC 843: Application Development for Internet Based Services
Example: the xml file
<?xml version="1.0" encoding="ISO-8859-1"?><?xml-stylesheet type="text/xsl" href="simple.xsl" ?><breakfast_menu>
<food><name>Belgian Waffles</name><price>$5.95</price><description>two of our famous Belgian Waffles with plenty of real maple
syrup</description><calories>650</calories>
</food><food>
<name>Strawberry Belgian Waffles</name><price>$7.95</price><description>light Belgian waffles covered with strawberries and whipped
cream</description><calories>900</calories>
</food>…
</breakfast_menu>
COSC 843: Application Development for Internet Based Services
Example: the xsl file
<?xml version="1.0" encoding="ISO-8859-1"?><html xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/TR/xhtml1/strict"> <body style="font-family:Arial,helvetica,sans-serif;font-size:12pt; background-color:#EEEEEE"> <xsl:for-each select="breakfast_menu/food"> <div style="background-color:teal;color:white;padding:4px"> <span style="font-weight:bold;color:white"> <xsl:value-of select="name"/></span> - <xsl:value-of select="price"/> </div> <div style="margin-left:20px;margin-bottom:1em;font-size:10pt"> <xsl:value-of select="description"/> <span style="font-style:italic"> (<xsl:value-of select="calories"/> calories per serving) </span> </div> </xsl:for-each> </body></html>
COSC 843: Application Development for Internet Based Services
View the result in IE 6
COSC 843: Application Development for Internet Based Services
XML Embedded in HTML XML can be embedded within HTML pages in Data Islands
Manipulated via client side script or data binding
The unofficial <xml> tag is used to embed XML data within HTML
The id attribute of the <xml> tag defines an ID for the data island, and the src attribute points to the XML file to embed:
The next step is to format and display the data in the data island by binding it to HTML elements.
<html> <body>
<xml id="note" src="note.xml"></xml>
</body> </html>
COSC 843: Application Development for Internet Based Services
Bind Data Island to HTML Elements Data Islands can be bound to HTML elements (like HTML tables)
An XML data island with ID “cdcat” is loaded from an external XML file
An HTML table is bound to the data Island with a datasrc attribute
The td elements are bound to the XML data with a datafld attribute inside a span.
<html> <body> <xml id="cdcat" src="cd_catalog.xml"></xml> <table border="1" datasrc="#cdcat"> <tr> <td> <span datafld="ARTIST"> </span> </td> <td> <span datafld="TITLE"> </span> </td></tr> </table> </body> </html>
COSC 843: Application Development for Internet Based Services
The Microsoft XML Parser
To read and update an XML document, you need an XML parser
The Microsoft XML parser comes with Microsoft Internet Explorer 5.0
Once you have installed IE 5.0, the parser is available to scripts, both inside HTML documents. The parser features a language-neutral programming model that
supports: JavaScript, VBScript, Perl, VB, Java, C++ and more W3C XML 1.0 and XML DOM DTD and validation
You can create an XML document object with the following code: var xmlDoc=new ActiveXObject("Microsoft.XMLDOM")
COSC 843: Application Development for Internet Based Services
Loading an XML file into the parser
XML files can be loaded into the parser using script code.
The following code loads an XML document (note.xml) into the XML parser: <script type="text/javascript">
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false" xmlDoc.load("note.xml") // ....... processing the document goes here </script>
The second line in the code above creates an instance of the Microsoft XML parser
The third line turns off asynchronized loading, to make sure that the parser will not continue execution before the document is fully loaded
The fourth line tells the parser to load the XML document called note.xml
We will revisit these issues later
COSC 843: Application Development for Internet Based Services
Namespaces XML allows you to define a new document format by combining and reusing
other formats This can lead to name conflicts since the document formats being combined
may have the same element names that are used for different purposes
Namespaces allow authors to differentiate between tags of the same name (using a prefix) That is, name conflicts are solved using a prefix Frees author to focus on the data and decide how to best describe it
The W3C namespace specification states that a namespace should be identified by a URI (Uniform Resource Identifier)
A URI is a string of characters which identifies an Internet resource A URL is the most common URI used to identify resources and their location on
the Internet Another less common type of URI is URN (Universal Resource Name)
When a URL is used in a namespace declaration, the URL does NOT have to represent a live server
The only purpose is to give the namespace a unique name. However, very often companies use the namespace as a pointer to a real Web page containing information about the namespace
COSC 843: Application Development for Internet Based Services
Namespaces: Declaration
xmlns: bk = "http://www.example.com/bookinfo/"
xmlns: bk = "urn:mybookstuff.org:bookinfo"
Namespace declaration
Namespace declaration examples:
Prefix URI (URL)
xmlns: bk = "http://www.example.com/bookinfo/"
COSC 843: Application Development for Internet Based Services
Namespaces: Examples
<BOOK xmlns:bk="http://www.bookstuff.org/bookinfo"> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE currency='US Dollar'>19.99</bk:PRICE></BOOK>
<bk:BOOK xmlns:bk="http://www.bookstuff.org/bookinfo"xmlns:money="urn:finance:money">
<bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE money:currency='US Dollar'> 19.99</bk:PRICE></bk:BOOK>
COSC 843: Application Development for Internet Based Services
Namespaces: Default Namespace
An XML namespace declared without a prefix becomes the default namespace for all sub-elements
All elements without a prefix will belong to the default namespace:
<BOOK xmlns="http://www.bookstuff.org/bookinfo"> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR>
COSC 843: Application Development for Internet Based Services
Namespaces: Scope
Unqualified elements belong to the inner-most default namespace. BOOK, TITLE, and AUTHOR belong to the default BOOK
namespace PUBLISHER and NAME belong to the default PUBLISHER
namespace
<BOOK xmlns="www.bookstuff.org/bookinfo"> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> <PUBLISHER xmlns="urn:publishers:publinfo"> <NAME>Microsoft Press</NAME> </PUBLISHER></BOOK>
COSC 843: Application Development for Internet Based Services
Namespaces: Attributes
Unqualified attributes do NOT belong to any namespace Even if there is a default namespace They don’t need to since scope of attributes is only
within the element for which they are attributes
This differs from elements, which belong to the default namespace
COSC 843: Application Development for Internet Based Services
Entities
Entities provide a mechanism for textual substitution for special characters, e.g.
XML parsers normally parse all the text in an XML document
When an XML element is parsed, the text between the XML tags is also parsed
If you place special characters like “<“ inside an XML element, it will generate an error because the parser interprets it as the start of a new element Entity references are used to avoid such errors
Entity Substitution
< <
& &
COSC 843: Application Development for Internet Based Services
CDATA By default, all text inside an XML document is parsed
You can force text to be treated as unparsed character data by enclosing it in <![CDATA[ ... ]]>
Any characters, even & and <, can occur inside a CDATA
Whitespace inside a CDATA is (usually) preserved
The only real restriction is that the character sequence ]]> cannot occur inside a CDATA
CDATA is useful when your text has a lot of illegal characters (for example, if your XML document contains some HTML text)
COSC 843: Application Development for Internet Based Services
CDATA<?xml version=‘1.0’?>
<myTag>
<![CDATA[
function matchwo(a,b){ if(a<b && a<0) then
return 1;
else
return 0;
}
]]>
</myTag>
COSC 843: Application Development for Internet Based Services
References
W3 Schools XML Tutorial http://www.w3schools.com/xml/default.asp
W3C XML page http://www.w3.org/XML/
XML Tutorials http://www.programmingtutorials.com/xml.aspx
Online resource for markup language technologies http://xml.coverpages.org/
Several Online Presentations
COSC 843: Application Development for Internet Based Services
2 Document Type Definitions (DTDs)
What are DTDs?
Why DTDs?
DTD Syntactic Elements ELEMENT ATTRIBUTE ENTITY Types
Examples
Validation
COSC 843: Application Development for Internet Based Services
What are DTDs?
Document Type Definition (DTD) is a grammar that describes the structure of a class of XML documents structure of the documents is described via
element and attribute-list declarations.
Element declarations name the allowable set of elements within the document, and specify whether and how declared elements and runs of character
data may be contained within each element.
Attribute-list declarations name the allowable set of attributes for each declared element,
including the type of each attribute value, if not an explicit set of valid value(s).
DTDs are written in EBNF-like notation
COSC 843: Application Development for Internet Based Services
Why DTDs?
XML documents are designed to be processed by computer programs If you can put just any tags in an XML document, it’s very hard to
write a program that knows how to process the tags A DTD specifies what tags may occur, when they may occur,
and what attributes they may (or must) have
A DTD allows the XML document to be verified (shown to be legal)
A DTD that is shared across groups allows the groups to produce consistent XML documents
COSC 843: Application Development for Internet Based Services
Parsers
An XML parser is an API that reads the content of an XML document Currently popular APIs are DOM (Document Object
Model) and SAX (Simple API for XML)
A validating parser is an XML parser that compares the XML document to a DTD and reports any errors
COSC 843: Application Development for Internet Based Services
An XML example <novel>
<foreword> <paragraph>This is a great novel.
</paragraph> </foreword> <chapter number="1"> <paragraph>It was a dark and stormy
night.</paragraph> <paragraph>Suddenly, a shot rang
out!</paragraph> </chapter></novel>
An XML document contains (and the DTD describes): Elements, such as novel and paragraph, consisting of tags and
content Attributes, such as number="1", consisting of a name and a value Entities (not used in this example)
COSC 843: Application Development for Internet Based Services
A DTD example <!DOCTYPE novel [
<!ELEMENT novel (foreword, chapter+)> <!ELEMENT foreword (paragraph+)> <!ELEMENT chapter (paragraph+)> <!ELEMENT paragraph (#PCDATA)> <!ATTLIST chapter number CDATA #REQUIRED>]>
A novel consists of a foreword and one or more chapters, in that order Each chapter must have a number attribute
A foreword consists of one or more paragraphs
A chapter also consists of one or more paragraphs
A paragraph consists of parsed character data (text that cannot contain any other elements)
PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded.
CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.
COSC 843: Application Development for Internet Based Services
ELEMENT descriptions
Suffixes:
? optional foreword?+ one or more chapter+* zero or more appendix*
Separators
, both, in order foreword?, chapter+
| or section|chapter
Grouping
( ) grouping (section|chapter)+
COSC 843: Application Development for Internet Based Services
Elements without children
The syntax is <!ELEMENT name category> The name is the element name used in start and end
tags The category may be EMPTY:
In the DTD: <!ELEMENT br EMPTY> In the XML: <br></br> or just <br />
In the XML, an empty element may not have any content between the start tag and the end tag
An empty element may (and usually does) have attributes
COSC 843: Application Development for Internet Based Services
Elements with unstructured children
The syntax is <!ELEMENT name category> The category may be ANY
This indicates that any content -- character data, elements, even undeclared elements -- may be used
Since the whole point of using a DTD is to define the structure of a document, ANY should be avoided wherever possible
The category may be (#PCDATA), indicating that only character data may be used
In the DTD: <!ELEMENT paragraph (#PCDATA)> In the XML: <paragraph>A shot rang out!</paragraph> The parentheses are required! Note: In (#PCDATA), whitespace is kept exactly as entered Elements may not be used within parsed character data Entities are character data, and may be used
COSC 843: Application Development for Internet Based Services
Elements with children
A category may describe one or more children: <!ELEMENT novel (foreword, chapter+)> Parentheses are required, even if there is only one child A space must precede the opening parenthesis Commas (,) between elements mean that all children must
appear, and must be in the order specified “|” separators means any one child may be used All child elements must themselves be declared Children may have children Parentheses can be used for grouping:
<!ELEMENT novel (foreword, (chapter+|section+))>
COSC 843: Application Development for Internet Based Services
Elements with mixed content
#PCDATA describes elements with only character data
#PCDATA can be used in an “or” grouping: <!ELEMENT note (#PCDATA|message)*> This is called mixed content Certain (rather severe) restrictions apply:
#PCDATA must be first The separators must be “|” The group must be starred (meaning zero or more)
COSC 843: Application Development for Internet Based Services
Names and namespaces
All names of elements, attributes, and entities, in both the DTD and the XML, are formed as follows: The name must begin with a letter or underscore The name may contain only letters, digits, dots, hyphens,
underscores, and colons
The DTD doesn’t know about namespaces -- as far as it knows, a colon is just part of a name The following are different (and both legal):
<!ELEMENT chapter (paragraph+)> <!ELEMENT myBook:chapter (myBook:paragraph+)>
Avoid colons in names, except to indicate namespaces
COSC 843: Application Development for Internet Based Services
An expanded DTD example
<!DOCTYPE novel [ <!ELEMENT novel (foreword, chapter+, biography?, criticalEssay*)> <!ELEMENT foreword (paragraph+)> <!ELEMENT chapter (section+|paragraph+)> <!ELEMENT section (paragraph+)> <!ELEMENT biography(paragraph+)> <!ELEMENT criticalEssay (section+)> <!ELEMENT paragraph (#PCDATA)>]>
COSC 843: Application Development for Internet Based Services
Attributes and entities
In addition to elements, a DTD may declare attributes and entities
An attribute describes information that can be put within the start tag of an element In XML: <car name= "Toyota" model= "2001"></car> In DTD: <!ATTLIST car
name CDATA #REQUIREDmodel CDATA #IMPLIED >
An entity describes text to be substituted In XML: ©right;
In the DTD: <!ENTITY copyright "Copyright KFUPM">
COSC 843: Application Development for Internet Based Services
Attributes
The format of an attribute is:<!ATTLIST element-name
name type requirementname type requirement>
where the name-type-requirement may be repeated as many times as desired Note that only spaces separate the parts, so careful counting is
essential The element-name tells which element may have these
attributes The name is the name of the attribute Each attribute has a type, such as CDATA (character data) Each attribute may be required, optional, or “fixed” In the XML, attributes may occur in any order
COSC 843: Application Development for Internet Based Services
Important attribute types
There are ten attribute types
These are the most important ones: CDATA The value is character data (man|woman|child) The value is one from this list ID The value is a unique identifier
ID values must be legal XML names and must be unique within the document
NMTOKEN The value is a legal XML name This is sometimes used to disallow whitespace in the name It also disallows numbers, since an XML name cannot begin with a
digit
The other seven, less frequently used, are: IDREF, IDREFS, NMTOKENS, ENTITY, ENTITIES,
NOTATION, xml:
COSC 843: Application Development for Internet Based Services
Requirements
Recall that an attribute has the form<!ATTLIST element-name name type requirement>
The requirement is one of: A default value, enclosed in quotes
Example: <!ATTLIST degree CDATA "PhD"> #REQUIRED
The attribute must be present
#IMPLIED The attribute is optional
#FIXED "value" The attribute always has the given value If specified in the XML, the same value must be used
COSC 843: Application Development for Internet Based Services
Entities There are exactly five predefined entities: <, >, &,
", and '
Additional entities can be defined in the DTD: <!ENTITY copyright "Copyright KFUPM">
Entities can be defined in another document: <!ENTITY copyright SYSTEM "MyURI">
Example of use in the XML: This document is ©right; 2002.
Entities are a way to include fixed text (sometimes called “boilerplate”)
Entities should not be confused with character references, which are numerical values between & and # Example: &233#; or &xE9#; to indicate the character é
COSC 843: Application Development for Internet Based Services
Another example: XML
<?xml version="1.0"?><!DOCTYPE myXmlDoc SYSTEM "http://www.mysite.com/mydoc.dtd"><weatherReport> <date>05/29/2002</date> <location> <city>Philadelphia</city>, <state>PA</state> <country>USA</country> </location> <temperature-range> <high scale="F">84</high> <low scale="F">51</low> </temperature-range></weatherReport>
COSC 843: Application Development for Internet Based Services
The DTD for this example
<!ELEMENT weatherReport (date, location, temperature-range)><!ELEMENT date (#PCDATA)><!ELEMENT location (city, state, country)><!ELEMENT city (#PCDATA)><!ELEMENT state (#PCDATA)><!ELEMENT country (#PCDATA)><!ELEMENT temperature-range ((low, high)|(high, low))><!ELEMENT low (#PCDATA)><!ELEMENT high (#PCDATA)><!ATTLIST low scale (C|F) #REQUIRED> <!ATTLIST high scale (C|F) #REQUIRED>
COSC 843: Application Development for Internet Based Services
Inline DTDs
If a DTD is used only by a single XML document, it can be put directly in that document: <?xml version="1.0">
<!DOCTYPE myRootElement [ <!-- DTD content goes here -->]><myRootElement> <!-- XML content goes here --></myRootElement>
An inline DTD can be used only by the document in which it occurs
COSC 843: Application Development for Internet Based Services
External DTDs
An external DTD (a DTD that is a separate document) is declared with a SYSTEM or a PUBLIC command: <!DOCTYPE myRootElement SYSTEM
"http://www.mysite.com/mydoc.dtd"> The name that appears after DOCTYPE (in this example,
myRootElement) must match the name of the XML document’s root element
Use SYSTEM for external DTDs that you define yourself, and use PUBLIC for official, published DTDs
The file extension for an external DTD is .dtd External DTDs can only be referenced with a URL
External DTDs are almost always preferable to inline DTDs, since they can be used by more than one document
COSC 843: Application Development for Internet Based Services
Limitations of DTDs
DTDs are a very weak specification language You can’t put any restrictions on element contents It’s difficult to specify:
All the children must occur, but may be in any order This element must occur a certain number of times
There are only ten data types for attribute values
But most of all: DTDs aren’t written in XML! If you want to do any validation, you need one parser for the
XML and another for the DTD This makes XML parsing harder than it needs to be There is a newer and more powerful technology: XML
Schemas However, DTDs are still very much in use
COSC 843: Application Development for Internet Based Services
Validators
Opera 5 and Internet Explorer 5 can validate your XML against an internal DTD IE provides (slightly) better error messages Opera apparently just ignores external DTDs IE considers an external DTD to be an error
jEdit with the XML plugin will check for well-structuredness and (if the DTD is inline) will validate your XML each time you do a Save http://www.jedit.org/
Validate [Using Inline DTD] http://www.stg.brown.edu/service/xmlvalid/
COSC 843: Application Development for Internet Based Services
References
W3School DTD Tutorial http://www.w3schools.com/dtd/default.asp
MSXML 4.0 SDK
http://www.topxml.com
http://www.xml.org
http://www.xml.com
Several online presentations
COSC 843: Application Development for Internet Based Services
3 XML Schema Definition (XSD) What is XSD?
An XML Document with Its Schema
Referencing A Schema from XML Document
Simple and Complex Elements
Predefined Types Numeric types Date and Time types String types
Defining Schema Components Simple Elements Attributes Restrictions or Facets Enumeration Complex Elements
COSC 843: Application Development for Internet Based Services
What is XML Schema?
The origin of schema XML Schema documents are used to define and
validate the content and structure of XML data XML Schema was originally proposed by Microsoft,
but became an official W3C recommendation in May 2001 http://www.w3.org/XML/Schema
COSC 843: Application Development for Internet Based Services
Why Schema?
InformationStructureFormat
Traditional Document: Everything is clumped together
Information
Structure
Format
“Fashionable” Document: A document is broken into discrete parts, which can be treated separately
Separating Information from Structure and Format
COSC 843: Application Development for Internet Based Services
Why Schema?
Schema Workflow
COSC 843: Application Development for Internet Based Services
DTD vs. Schema
DTD XSD
No constraints on character data Can constrain character data like requiring a string to be of a fixed characters
Not using XML syntax Uses XML syntax and thus frees developer of the need to learn another language. XML transformations can be applied, too.
No support for namespace Supports namespaces
Very limited for reusability and extensibility Can reuse in other schemas, create own derived data types and reference multiple schemas from same document
Easier to write DTD-based validators: may only need to check existence of content like PCDATA
Schema-based validators are more difficult to write because we may have to validate content detail
Easier to understand More complex: The notion of “type” adds an extra layer of confusing complexity
COSC 843: Application Development for Internet Based Services
XML.org Registry The XML.coverpages.org is a comprehensive, online reference collection
supporting the XML family of markup language standards, XML vocabularies, and related structured information standards.
COSC 843: Application Development for Internet Based Services
Example 1: An XML Document Instance
<?xml version="1.0" encoding="utf-8"?>
<book isbn="0836217462">
<title> … </title>
<author> … </author>
<qualification> … </qualification>
</book>
COSC 843: Application Development for Internet Based Services
Schema for Example 1<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
<xs:element name="qualification" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema> book.xsd
COSC 843: Application Development for Internet Based Services
Example 2: An XML Document and Its Schema
<letter> Dear Mr.<name>John Smith</name>. Your order <orderid>1032</orderid> will be shipped on <shipdate>2001-07-13</shipdate>. </letter>
<xs:element name="letter"> <xs:complexType mixed="true"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="orderid" type="xs:integer"/> <xs:element name="shipdate" type="xs:date"/> </xs:sequence> </xs:complexType> </xs:element>
COSC 843: Application Development for Internet Based Services
The XSD Document
Since the XSD is written in XML, it can get confusing which we are talking about
The file extension is .xsd
The root element is <schema>
The XSD starts like this: <?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
COSC 843: Application Development for Internet Based Services
<schema>
The <schema> element may have attributes: xmlns:xs="http://www.w3.org/2001/XMLSchema"
Indicates that the elements used in the schema (schema, element, complextType, etc) come from this namespace
elementFormDefault="qualified" This means that all XML elements must be qualified (i.e.,
prefixed with xs)
COSC 843: Application Development for Internet Based Services
Referring to a Schema To refer to a DTD in an XML document, the reference goes before
the root element: <?xml version="1.0"?>
<!DOCTYPE rootElement SYSTEM "url"><rootElement> ... </rootElement>
To refer to an XML Schema in an XML document, the reference goes in the root element:
<?xml version="1.0"?><rootElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="url.xsd"> ...</rootElement>
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance Schema instance namespace This attribute has two values for
The namespace to use and the second value is the location of the XML schema to use for that namespace:
COSC 843: Application Development for Internet Based Services
“Simple” and “Complex” Elements
A “simple” element is one that contains text and nothing else A simple element cannot have attributes A simple element cannot contain other elements A simple element cannot be empty However, the text can be of many different types,
and may have various restrictions applied to it
If an element isn’t simple, it’s “complex” A complex element may have attributes A complex element may be empty, or it may contain
text, other elements, or both text and other elements
COSC 843: Application Development for Internet Based Services
Predefined Numeric Types
Here are some of the predefined numeric types:
Allowable restrictions on numeric types: enumeration, minInclusive, minExclusive,
maxInclusive, maxExclusive, fractionDigits, totalDigits, pattern, whiteSpace
xs:decimal xs:positiveInteger
xs:byte xs:negativeInteger
xs:short xs:nonPositiveInteger
xs:int xs:nonNegativeInteger
xs:long
COSC 843: Application Development for Internet Based Services
Predefined Date and Time Types
xs:date - A date in the format CCYY-MM-DD, for example, 2003-11-05
xs:time - A time in the format hh:mm:ss (hours, minutes, seconds)
xs:dateTime - Format is CCYY-MM-DDThh:mm:ss
Allowable restrictions on dates and times: enumeration, minInclusive,
minExclusive, maxInclusive, maxExclusive, pattern, whiteSpace
COSC 843: Application Development for Internet Based Services
Predefined String Types
Recall that a simple element is defined as: <xs:element name="name" type="type" />
Here are a few of the possible string types: xs:string - a string xs:normalizedString - a string that doesn’t contain
tabs, newlines, or carriage returns xs:token - a string that doesn’t contain any whitespace other
than single spaces
Allowable restrictions on strings: enumeration, length, maxLength, minLength,
pattern, whiteSpace
COSC 843: Application Development for Internet Based Services
Defining a Simple Element
A simple element is defined as <xs:element name="name" type="type" />where: name is the name of the element the most common values for type are
xs:boolean xs:integer xs:date xs:string xs:decimal xs:time
Other attributes a definition of a simple element may have: default="default value" if no other value is specified fixed="value" no other value may be
specified
COSC 843: Application Development for Internet Based Services
Defining an Attribute
Attributes themselves are always declared as simple types
An attribute is defined as <xs:attribute name="name" type="type" />where: name and type are the same as for xs:element
Other attributes a definition of a simple element may have: default="default value" if no other value is specified fixed="value" no other value may be specified use="optional" the attribute is not required (default) use="required" the attribute must be present
COSC 843: Application Development for Internet Based Services
Restrictions, or “Facets”
The general form for putting a restriction on a text value is: <xs:element name="name"> (or xs:attribute)
<xs:simpleType> <xs:restriction base="type"> ... the restrictions ... </xs:restriction> </xs:simpleType></xs:element>
For example: <xs:element name="age">
<xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="20"/> <xs:maxInclusive value="100"/> </xs:restriction> </xs:simpleType> </xs:element>
COSC 843: Application Development for Internet Based Services
Restrictions, or “Facets”
The “age" element is a simple type with a restriction. The acceptable values are: 20 to 100
The example above could also have been written like this:
<xs:element name="age" type="ageType"/><xs:simpleType name="ageType">
<xs:restriction base="xs:integer"> <xs:minInclusive value="20"/> <xs:maxInclusive value="100"/> </xs:restriction>
</xs:simpleType>
COSC 843: Application Development for Internet Based Services
Restrictions on numbers
minInclusive number must be ≥ the given value
minExclusive number must be > the given value
maxInclusive number must be ≤ the given value
maxExclusive number must be < the given value
totalDigits number must have exactly value digits
fractionDigits number must have no more than value digits after the decimal point
COSC 843: Application Development for Internet Based Services
Restrictions on strings
length the string must contain exactly value characters
minLength the string must contain at least value characters
maxLength the string must contain no more than value characters
pattern the value is a regular expression that the string must match
whiteSpace not really a “restriction” - tells what to do with whitespace value="preserve" Keep all whitespace value="replace" Change all whitespace characters to spaces value="collapse" Remove leading and trailing whitespace, and replace
all sequences of whitespace with a single space
COSC 843: Application Development for Internet Based Services
Restriction with Regular Expression Patterns
<xs:element name=“letter"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value=“([a-z])*"/> </xs:restriction> </xs:simpleType></xs:element>
<xs:element name=“password"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value=“[a-zA-Z0-9]{8}"/> </xs:restriction> </xs:simpleType></xs:element>
Test these and find out whether the semantics of regular expressions is the same as that in JavaScript
COSC 843: Application Development for Internet Based Services
Enumeration
An enumeration restricts the value to be one of a fixed set of values
Example: <xs:element name="season">
<xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Spring"/> <xs:enumeration value="Summer"/> <xs:enumeration value="Autumn"/> <xs:enumeration value="Fall"/> <xs:enumeration value="Winter"/> </xs:restriction> </xs:simpleType></xs:element>
COSC 843: Application Development for Internet Based Services
Complex Elements
A complex element is defined as <xs:element name="name"> <xs:complexType> ... information about the complex type... </xs:complexType> </xs:element>
Example:<xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType></xs:element>
COSC 843: Application Development for Internet Based Services
Complex Elements
Another example – using a type attribute
<xs:element name="employee" type="personinfo"/><xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType>
COSC 843: Application Development for Internet Based Services
xs:sequence
We’ve already seen an example of a complex type whose elements must occur in a specific order:
<xs:element name="person">
<xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName"type="xs:string" /> </xs:sequence></xs:complexType>
</xs:element>
COSC 843: Application Development for Internet Based Services
xs:all xs:all allows elements to appear in any order
<xs:element name="person"> <xs:complexType> <xs:all> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:all> </xs:complexType> </xs:element>
Despite the name, the members of an xs:all group can occur once or not at all
You can use minOccurs="n" and maxOccurs="n" to specify how many times an element may occur (default value is 1) In this context, n may only be 0 or 1
COSC 843: Application Development for Internet Based Services
Extensions
You can base a complex type on another complex type
<xs:complexType name="newType"> <xs:complexContent> <xs:extension base="otherType"> ...new stuff... </xs:extension> </xs:complexContent></xs:complexType>
COSC 843: Application Development for Internet Based Services
Text Element with Attributes
If a text element has attributes, it is no longer a simple type<xs:element name="population">
<xs:complexType> <xs:simpleContent> <xs:extension base="xs:integer"> <xs:attribute name="year"
type="xs:integer"> </xs:extension> </xs:simpleContent> </xs:complexType>
</xs:element>
COSC 843: Application Development for Internet Based Services
Empty Elements
Empty elements are (ridiculously) complex
<xs:complexType name="counter"> <xs:complexContent> <xs:extension base="xs:integer"/> <xs:attribute name="count"
type="xs:integer"/> </xs:complexContent></xs:complexType>
COSC 843: Application Development for Internet Based Services
Mixed Elements Mixed elements may contain both text and elements
We add mixed="true" to the xs:complexType element
The text itself is not mentioned in the element, and may go anywhere (it is basically ignored)
<xs:complexType name="paragraph" mixed="true"> <xs:sequence> <xs:element name="someName"
type="xs:anyType"/> </xs:sequence></xs:complexType>
See Example 2 at the start of this section
COSC 843: Application Development for Internet Based Services
References
W3School XSD Tutorial http://www.w3schools.com/schema/default.asp
MSXML 4.0 SDK
Several online presentations
COSC 843: Application Development for Internet Based Services
4 XPath
What is XPath?
Sample Syntactic Elements Path Slashes Brackets Stars
Arithmetic Expressions
Some XPath Functions
COSC 843: Application Development for Internet Based Services
What is XPath?
XPath is a syntax used for selecting parts of an XML document
The way XPath describes paths to elements is similar to the way an operating system describes paths to files
XPath is almost a small programming language; it has functions, tests, and expressions
XPath is a W3C standard http://www.w3.org/TR/xpath
XPath is not itself written as XML, but is used heavily in XSLT
COSC 843: Application Development for Internet Based Services
COSC 843: Application Development for Internet Based Services
Terminology
<library> <book>
<chapter> </chapter>
<chapter> <section> <paragraph/> <paragraph/> </section> </chapter>
</book></library>
library is the parent of book; book is the
parent of the two chapters
The two chapters are the children of book,
and the section is the child of the second
chapter
The two chapters of the book are siblings
(they have the same parent)
library, book, and the second chapter are
the ancestors of the section
The two chapters, the section, and the
two paragraphs are the descendents of the
book
COSC 843: Application Development for Internet Based Services
Paths
/library = the root element (if named library )
Operating system: XPath:
/ = the root directory
/users/dave/foo = the file named foo in dave in users
/library/book/chapter/section = every section element in a chapter in every book in the library
. = the current directory . = the current element
.. = the parent directory .. = parent of the current element
/users/dave/* = all the files in /users/dave
/library/book/chapter/* = all the elements in /library/book/chapter
foo = the file named foo in the current directory
section = every section element that is a child of the current element
COSC 843: Application Development for Internet Based Services
Slashes A path that begins with a / represents an absolute path, starting
from the top of the document Example: /email/message/header/from Note that even an absolute path can select more than one element A slash by itself means “the whole document”
A path that does not begin with a / represents a path starting from the current element Example: header/from
A path that begins with // can start from anywhere in the document Example: //header/from selects every element from that is a
child of an element header This can be expensive, since it involves searching the entire document
COSC 843: Application Development for Internet Based Services
Brackets and last()
A number in brackets selects a particular matching child Example: /library/book[1] selects the first book of the
library Example: //chapter/section[2] selects the second section
of every chapter in the XML document Example: //book/chapter[1]/section[2] Only matching elements are counted; for example, if a book
has both sections and exercises, the latter are ignored when counting sections
The function last() in brackets selects the last matching child Example: /library/book/chapter[last()]
You can even do simple arithmetic Example: /library/book/chapter[last()-1]
COSC 843: Application Development for Internet Based Services
Stars
A star, or asterisk, is a “wild card”--it means “all the elements at this level” Example: /library/book/chapter/* selects every
child of every chapter of every book in the library Example: //book/* selects every child of every book
(chapters, tableOfContents, index, etc.) Example: /*/*/*/paragraph selects every
paragraph that has exactly three ancestors Example: //* selects every element in the entire
document
COSC 843: Application Development for Internet Based Services
Attributes I
You can select attributes by themselves, or elements that have certain attributes Remember: an attribute consists of a name-value pair, for
example in <chapter num="5">, the attribute is named num
To choose the attribute itself, prefix the name with @ Example: @num will choose every attribute named num Example: //@* will choose every attribute, everywhere in the
document
To choose elements that have a given attribute, put the attribute name in square brackets Example: //chapter[@num] will select every chapter
element (anywhere in the document) that has an attribute named num
COSC 843: Application Development for Internet Based Services
Attributes II
//chapter[@num] selects every chapter element with an attribute num
//chapter[not(@num)] selects every chapter element that does not have a num attribute
//chapter[@*] selects every chapter element that has any attribute
//chapter[not(@*)] selects every chapter element with no attributes
COSC 843: Application Development for Internet Based Services
Values of attributes
//chapter[@num='3'] selects every chapter element with an attribute num with value 3
The normalize-space() function can be used to remove leading and trailing spaces from a value before comparison Example: //chapter[normalize-
space(@num)="3"]
COSC 843: Application Development for Internet Based Services
Arithmetic Expressions
+ add
- subtract
* multiply
div (not /) divide
mod modulo (remainder)
COSC 843: Application Development for Internet Based Services
Equality Tests
= “equals” (Notice it’s not ==)
!= “not equals”
But it’s not that simple!
value = node-set will be true if the node-set contains
any node with a value that matches value
value != node-set will be true if the node-set
contains any node with a value that does not match value
Hence,
value = node-set and value != node-set may both be true at the same time!
COSC 843: Application Development for Internet Based Services
Other Boolean Operators
and (infix operator)
or (infix operator)
Example: count = 0 or count = 1
not() (function)
The following are used for numerical comparisons only: < “less than” <= “less than or equal to” > “greater than” >= “greater than or equal to”
COSC 843: Application Development for Internet Based Services
Some XPath Functions
XPath contains a number of functions on node sets, numbers, and strings; here are a few of them: count(elem) counts the number of selected elements
Example: //chapter[count(section)=1] selects chapters with exactly one section child
name() returns the name of the element Example: //*[name()='section'] is the same as //section
starts-with(arg1, arg2) tests if arg1 starts with arg2 Example: //*[starts-with(name(), 'sec')]
contains(arg1, arg2) tests if arg1 contains arg2 Example: //*[contains(name(), 'ect')]
Examples http://www.zvon.org/xxl/XPathTutorial/General/examples.html
COSC 843: Application Development for Internet Based Services
References
W3School XPath Tutorial http://www.w3schools.com/xpath/default.asp
MSXML 4.0 SDK
Several online presentations
COSC 843: Application Development for Internet Based Services
5 XSL / XSLT What is XSL?
Some XSLT Constructs xsl:value-of xsl:for-each xsl:if xsl:choose xsl:sort xsl:text xsl:attribute
Templates
XSL on the Client
XSL on the Server
COSC 843: Application Development for Internet Based Services
What is XSL? XSL stands for eXtensible Stylesheet Language
a standard recommended by the W3C http://www.w3.org/TR/xsl/
CSS was designed for styling HTML pages, and can be used to style XML pages
XSL was designed specifically to style XML pages, and is much more sophisticated than CSS
XSL consists of three languages: XSLT (XSL Transformations) is a language used to transform XML documents
into other kinds of documents (most commonly HTML, so they can be displayed)
XPath is a language to select parts of an XML document to transform with XSLT
XSL-FO (XSL Formatting Objects) is a replacement for CSS The future of XSL-FO as a standard is uncertain, because much of its functionality
overlaps with that provided by cascading style sheets (CSS) and the HTML tag set
COSC 843: Application Development for Internet Based Services
How does it work?
The XML source document is parsed into an XML source tree
You use XPath to define templates that match parts of the source tree
You use XSLT to transform the matched part and put the transformed information into the result tree
The result tree is output as a result document
Parts of the source document that are not matched by a template are typically copied unchanged
COSC 843: Application Development for Internet Based Services
Simple XPath
Here’s a simple XML document:
<?xml version="1.0"?><library> <book> <title>XML</title> <author>Gregory Brill</author> </book> <book> <title>Java and XML</title> <author>Brett Scott</author> </book></library >
XPath expressions look a lot like paths in a computer file system / means the document
itself (but no specific elements)
/library selects the root element
/library/book selects every book element
//author selects every author element, wherever it occurs
COSC 843: Application Development for Internet Based Services
Simple XSLT
<xsl:for-each select="//book"> loops through every book element, everywhere in the document
<xsl:value-of select="title"/> chooses the content of the title element at the current location
<xsl:for-each select="//book"> <xsl:value-of select="title"/></xsl:for-each>chooses the content of the title element for each book in the XML document
COSC 843: Application Development for Internet Based Services
Using XSL to Create HTML
Our goal is to turn this:
<?xml version="1.0"?><library> <book> <title>XML</title> <author>Gregory Brill</author> </book> <book> <title>Java and XML</title> <author>Brett Scott</author> </book></library >
Into HTML that displays something like this:
Book Titles: • XML • Java and XMLBook Authors: • Gregory Brill • Brett Scott
Note that we’ve grouped titles and authors separately
COSC 843: Application Development for Internet Based Services
What we need to do
We need to save our XML into a file (let’s call it books.xml)
We need to create a file (say, books.xsl) that describes how to select elements from books.xml and embed them into an HTML page We do this by intermixing the HTML and the XSL in
the books.xsl file
We need to add a line to our books.xml file to tell it to refer to books.xsl for formatting information
COSC 843: Application Development for Internet Based Services
books.xml, revised
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="books.xsl"?><library> <book> <title>XML</title> <author>Gregory Brill</author> </book> <book> <title>Java and XML</title> <author>Brett McLaughlin</author> </book></library >
This tells you whereto find the XSL file
COSC 843: Application Development for Internet Based Services
Desired HTML
<html> <head> <title>Book Titles and Authors</title> </head> <body> <h2>Book titles:</h2> <ul> <li>XML</li> <li>Java and XML</li> </ul> <h2>Book authors:</h2> <ul> <li>Gregory Brill</li> <li>Brett Scott</li> </ul> </body></html>
Red text is data extracted from the XML document
Blue text is our HTML template
We don’t necessarily know how much data
we will have
COSC 843: Application Development for Internet Based Services
XSL Outline
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html> ... </html>
</xsl:template>
</xsl:stylesheet>
COSC 843: Application Development for Internet Based Services
Selecting Titles and Authors
<h2>Book titles:</h2> <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="title"/> </li> </xsl:for-each> </ul><h2>Book authors:</h2> ...same thing, replacing title with author
Notice that XSL can rearrange the data; the HTML result can present information in a different order than the XML
Notice the xsl:for-
each loop
COSC 843: Application Development for Internet Based Services
All of books.xml
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="books.xsl"?><library> <book> <title>XML</title> <author>Gregory Brill</author> </book> <book> <title>Java and XML</title> <author>Brett Scott</author> </book></library >
Note: if you do View Source, this is what you will see, not the resultant HTML
COSC 843: Application Development for Internet Based Services
All of books.xsl
<?xml version="1.0" encoding="ISO-8859-1"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/ XSL/Transform"><xsl:template match="/"><html> <head> <title>Book Titles and Authors</title> </head> <body> <h2>Book titles:</h2> <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="title"/> </li> </xsl:for-each> </ul>
<h2>Book authors:</h2> <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="author"/> </li> </xsl:for-each> </ul> </body></html></xsl:template></xsl:stylesheet>
COSC 843: Application Development for Internet Based Services
How to use it
In a modern browser, such as Netscape 6, Internet Explorer 6, or Mozilla 1.0, you can just open the XML file Older browsers will ignore the XSL and just show
you the XML contents as continuous text
You can use a program such as Xalan, MSXML, or Saxon to create the HTML as a file This can be done on the server side, so that all the
client side browser sees is plain HTML The server can create the HTML dynamically from
the information currently in XML
COSC 843: Application Development for Internet Based Services
The result (in IE)
COSC 843: Application Development for Internet Based Services
XSLT
XSLT stands for eXtensible Stylesheet Language Transformations
XSLT is used to transform XML documents into other kinds of documents--usually, but not necessarily, XHTML
XSLT uses two input files: The XML document containing the actual data The XSL document containing both the “framework”
in which to insert the data, and XSLT commands to do so
COSC 843: Application Development for Internet Based Services
Understanding the XSLT Process
COSC 843: Application Development for Internet Based Services
The XSLT Processor
COSC 843: Application Development for Internet Based Services
The .xsl file
An XSLT document has the .xsl extension
The XSLT document begins with: <?xml version="1.0"?> <xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
Contains one or more templates, such as: <xsl:template match="/"> ... </xsl:template>
And ends with: </xsl:stylesheet>
The template <xsl:template match="/"> says select the entire file You can think of this as selecting the root node of the XML tree
COSC 843: Application Development for Internet Based Services
Where XSLT can be used
A server can use XSLT to change XML files into HTML files before sending them to the client
A modern browser can use XSLT to change XML into HTML on the client side This is what we will mostly be doing here
Most users seldom update their browsers If you want “everyone” to see your pages, do any
XSL processing on the server side Otherwise, think about what best fits your situation
COSC 843: Application Development for Internet Based Services
xsl:value-of
<xsl:value-of select="XPath expression"/> selects the contents of an element and adds it to the output stream The select attribute is required Notice that xsl:value-of is not a container tag,
hence it needs to end with a slash
COSC 843: Application Development for Internet Based Services
xsl:for-each
xsl:for-each is a kind of loop statement
The syntax is <xsl:for-each select="XPath expression"> Text to insert and rules to apply </xsl:for-each>
Example: to select every book (//book) and make an unordered list (<ul>) of their titles (title), use: <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="title"/> </li> </xsl:for-each> </ul>
COSC 843: Application Development for Internet Based Services
Filtering Output
You can filter (restrict) output by adding a criterion to the select attribute’s value: <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="title[../author='Brett Scott']"/> </li> </xsl:for-each> </ul>
This will select book titles by Brett Scott
COSC 843: Application Development for Internet Based Services
Filter Details
Here is the filter we just used: <xsl:value-of select="title[../author='Brett Scott']"/>
author is a sibling of title, so from title we have to go up to its parent, book, then back down to author
This filter requires a quote within a quote, so we need both single quotes and double quotes
Legal filter operators are: = != < > Numbers should be quoted
COSC 843: Application Development for Internet Based Services
But it doesn’t work right!
Here’s what we did: <xsl:for-each select="//book"> <li> <xsl:value-of select="title[../author='Brett Scott']"/> </li> </xsl:for-each>
This will output <li> and </li> for every book, so we will get empty bullets for authors other than Brett Scott
There is no obvious way to solve this with just xsl:value-of
COSC 843: Application Development for Internet Based Services
xsl:if
xsl:if allows us to include content if a given condition (in the test attribute) is true
Example: <xsl:for-each select="//book"> <xsl:if test="author='Brett Scott'"> <li> <xsl:value-of select="title"/> </li> </xsl:if> </xsl:for-each>
This does work correctly!
COSC 843: Application Development for Internet Based Services
xsl:choose
The xsl:choose ... xsl:when ... xsl:otherwise construct is XML’s equivalent of Java’s switch ... case ... default statement
The syntax is:<xsl:choose> <xsl:when test="some condition"> ... some code ... </xsl:when> <xsl:otherwise> ... some code ... </xsl:otherwise></xsl:choose>
xsl:choose is often used within anxsl:for-each loop
COSC 843: Application Development for Internet Based Services
xsl:sort
You can place an xsl:sort inside an xsl:for-each
The attribute of the sort tells what field to sort on
Example: <ul> <xsl:for-each select="//book"> <xsl:sort select="author"/> <li> <xsl:value-of select="title"/> by <xsl:value-of select="author"> </li> </xsl:for-each> </ul>
This example creates a list of titles and authors, sorted by author
COSC 843: Application Development for Internet Based Services
xsl:text Used inside templates to indicate that its contents should be output as text
Its contents are pure text, not elements, and white space is not collapsed
<xsl:text>...</xsl:text> helps deal with two common problems: XSL isn’t very careful with whitespace in the document
This doesn’t matter much for HTML, which collapses all whitespace anyway <xsl:text> gives you much better control over whitespace; it acts like the <pre> element in HTML
Since XML defines only five entities, you cannot readily put other entities (such as ) in your XSL
These are & (&), < (<), > (>), " (“), ' (‘) Others can be inserted using their decimal or hexadecimal number forms You may use the following secret formula for entities:
<xsl:text disable-output-escaping="yes">&nbsp;</xsl:text>
• A “yes” value means special characters like “<“ should be output as is. “no” indicates that “<“ should be output as “<”. Default is “no”
COSC 843: Application Development for Internet Based Services
Creating Tags from XML Data
Suppose the XML contains<name>Dr. Scott's Home Page</name><url>http://www.kfupm.edu/~scott</url>
And you want to turn this into<a href="http://www.kfupm.edu/~scott">Dr. Scott's Home Page</a>
We need additional tools to do this It doesn’t even help if the XML directly contains
<a href="http://www.kfupm.edu/~scott">Dr. Scott's Home Page</a> -- we still can’t move it to the output
The same problem occurs with images in the XML
A reason for the above is that attribute fields may not contain reserved characters like < and > in XML
COSC 843: Application Development for Internet Based Services
Creating Tags - solution 1 Suppose the XML contains
<name>Dr. Scott's Home Page</name> <url>http://www.kfupm.edu/~scott</url>
<xsl:attribute name="..."> adds the named attribute to the enclosing tag
The value of the attribute is the content of this tag
Example: <a>
<xsl:attribute name="href"> <xsl:value-of select="url"/> </xsl:attribute> <xsl:value-of select="name"/> </a>
Result: <a href="http://www.kfupm.edu/~scott">Dr. Scott's Home Page</a>
COSC 843: Application Development for Internet Based Services
Creating Tags - solution 2
Suppose the XML contains <name>Dr. Scott's Home Page</name> <url>http://www.kfupm.edu/~scott</url>
An attribute value template (AVT) consists of braces { } inside the attribute value
The content of the braces is replaced by its value
Example: <a href="{url}">
<xsl:value-of select="name"/> </a>
Result: <a href="http://www.kfupm.edu/~scott"> Dr. Scott's Home Page</a>
COSC 843: Application Development for Internet Based Services
Modularization
Modularization: breaking up a complex program into simpler parts (is an important programming tool) In programming languages modularization is often done with
functions or methods In XSL we can do something similar with
xsl:apply-templates
For example, suppose we have a DTD for book with parts titlePage, tableOfContents, chapter, and index We can create separate templates for each of these parts
Template rules are used to control what output is created from what input
COSC 843: Application Development for Internet Based Services
…Modularization A template rule is represented by an <xsl:template>
element
The <xsl:template> element has A match attribute that contains an XPath pattern identifying the
input it matches A template that is instantiated and output when the pattern is
matched
Template skeleton: <xsl:template match=“person”> A Person </xsl:template>
The above says that every time a <person> element is seen, the stylesheet processor should emit the text “A Person”
COSC 843: Application Development for Internet Based Services
Book example
<xsl:template match="/"> <html> <body> <xsl:apply-templates/> </body> </html></xsl:template>
<xsl:template match="tableOfContents"> <h1>Table of Contents</h1> <xsl:apply-templates select="chapterNumber"/>
<xsl:apply-templates select="chapterName"/> <xsl:apply-templates select="pageNumber"/></xsl:template>
Etc.
COSC 843: Application Development for Internet Based Services
xsl:apply-templates
The <xsl:apply-templates> element applies a template rule to the current element or to the current element’s child nodes
If we add a select attribute, it applies the template rule only to the child that matches
If we have multiple <xsl:apply-templates> elements with select attributes, the child nodes are processed in the same order as the <xsl:apply-templates> elements
COSC 843: Application Development for Internet Based Services
When templates are ignored
Templates aren’t used unless they are applied Exception: Processing always starts with
select="/" If it didn’t, nothing would ever happen
If your templates are ignored, you probably forgot to apply them
If you apply a template to an element that has child elements, templates are not automatically applied to those child elements
COSC 843: Application Development for Internet Based Services
Applying templates to children <book>
<title>XML</title> <author>Gregory Brill</author> </book>
<xsl:template match="/"> <html> <head></head> <body> <b><xsl:value-of select="/book/title"/></b> <xsl:apply-templates select="/book/author"/> </body> </html></xsl:template>
<xsl:template match="/book/author"> by <i><xsl:value-of select="."/></i></xsl:template>
With this line:XML by Gregory Brill
Without this line:XML
COSC 843: Application Development for Internet Based Services
Built-in Templates XSLT has a couple of built in templates, which say:
when you apply templates to an element, process its child elements when you apply templates to a text node, give its value
Together, it means that if you apply templates to an element but don't have an explicit template for that element, then its content gets processed and eventually you end up with the text that the element contains.
Here are the built-in template rules for each of the seven XPath node types:
Elements Apply templates to children
Text Copy text to the result tree
Comments Do nothing
PIs Do nothing
Attributes Copy the value of the attribute to the result tree
Name spaces Do nothing
Root Apply templates to children
COSC 843: Application Development for Internet Based Services
XSL - On the Client
If your browser supports XML, XSL can be used to transform the document to XHTML in your browser Even if this works fine, it is not always desirable to include a style
sheet reference in an XML file (i.e. it will not work in a non XSL aware browser.)
A JavaScript Solution A more versatile solution would be to use a JavaScript to do the XML
to XHTML transformation
By using JavaScript, we can: do browser-specific testing use different style sheets according to browser and user needs
XSL transformation on the client side is bound to be a major part of the browsers work tasks in the future, as we will see a growth in the specialized browser market (Braille, aural browsers, Web printers, handheld devices, etc.)
COSC 843: Application Development for Internet Based Services
Transforming XML to XHTML in Your Browser
<html> <body><script type="text/javascript">
// Load XML var xml = new ActiveXObject("Microsoft.XMLDOM") xml.async = false xml.load(“books.xml")
// Load XSL var xsl = new ActiveXObject("Microsoft.XMLDOM") xsl.async = false xsl.load(“books.xsl")
// Transform document.write(xml.transformNode(xsl))
</script> </body> </html>
COSC 843: Application Development for Internet Based Services
XSL - On the Server
Since not all browsers support XML and XSL, one solution is to transform the XML to XHTML on the server
To make XML data available to all kinds of browsers, we have to transform the XML document on the SERVER and send it as pure XHTML to the BROWSER
That's another beauty of XSL! One of the design goals for XSL was to make it possible to transform data from one format to another on a server, returning readable data to all kinds of future browsers
COSC 843: Application Development for Internet Based Services
Thoughts on XSL
XSL is a programming language--and not a particularly simple one Expect to spend considerable time debugging your XSL
These slides have been an introduction to XSL andXSLT--there’s a lot more of it we haven’t covered
As with any programming, it’s a good idea to start simple and build it up incrementally: “Write a little, test a little” This is especially a good idea for XSLT, because you don’t get
a lot of feedback about what went wrong
Try jEdit with the XML plugin write (or change) a line or two, check for syntax errors, then
jump to IE and reload the XML file
COSC 843: Application Development for Internet Based Services
References
W3School XSL Tutorial http://www.w3schools.com/xsl/default.asp
MSXML 4.0 SDK
http://www.topxml.com
http://www.xml.org
http://www.xml.com
Several online presentations