xml instructor: charles moen csci/cinf 4230. 2 xml extensible markup language a set of rules that...

27
XML Instructor: Charles Moen CSCI/CINF 4230

Upload: lynne-benson

Post on 04-Jan-2016

244 views

Category:

Documents


1 download

TRANSCRIPT

XML

Instructor: Charles Moen

CSCI/CINF 4230

2

XML

Extensible Markup Language

A set of rules that allow you to create your own markup language

Designed for delivering data over the Web in text files that are self-describing and readable both by computer programs and by humans

The XML specification has been maintained by the World Wide Web Consortium (W3C) since 1998

XML (Spainhour, Ray, W3Schools)

Example of an XML File

3

<?xml version="1.0" encoding="UTF-8" ?><sandwiches>

<sandwich name="Shrimp Poorboy"> <price>5.99</price> <sandwich> <sandwich name="Grilled Burger"> <price>4.99</price> </sandwich></sandwiches>

XML declaration is always on the first line.

XML uses markup tags, like HTML, but developers can invent their own tag names.

As long as the tags follow the XML syntax rules, we can invent whatever tags and attributes are needed to describe our data.

XML (Spainhour, Ray, W3Schools)

Problem with HTML

4

<h1>Beginning ASP.NET 3.5 in C# 2008</h1><h2>Matthew MacDonald</h2>

It’s difficult to get the meaning of this data by looking at the HTML elements.

XML (Yue)

HTML provides the structure of a Web page, but not the semantic meaning of its content.

<book> <title>Beginning ASP.NET 3.5 in C# 2008</title> <author>Matthew MacDonald</author></book>

XML can provide the semantic meaning through its markup tags.

5

XML is Portable Data

XML files are plain text files that contain markup tags

Any software that can process plain text can read XML• Hardware independent• Software independent• XML can be used to exchange data between incompatible

systems

XML-aware applications• Can process XML data as long as the application “knows” the

meaning of the tags• Meaning of the tags depends on the application

XML (Ding, W3Schools)

6

XML Technologies

XML

XML Namespaces

DTD (Document Type Definition) • For describing your markup language

XML Schema• An XML-based method of describing your markup language

XSL (Extensible Stylesheet Language)• For displaying and transforming XML documents

DOM (Document Object Model)• Object library for manipulating an XML document as a tree

XML (Yue)

7

XML documents must be well-formed

An XML document that conforms to the minimal XML syntax rules is well-formed

Elements must always have a closing tagTag names and attribute names are case-sensitiveElements must be properly nestedAll attributes must have a valueAll attribute values must be surrounded with quotes or apostrophesThe XML declaration is on the first lineThe document has a single root element

XML (Spainhour, Ray, W3Schools)

Root Element

8

<?xml version="1.0" encoding="UTF-8" ?><sandwiches>

<sandwich name="Shrimp Poorboy"> <price>5.99</price> <sandwich> <sandwich name="Grilled Burger"> <price>4.99</price> </sandwich></sandwiches >

Root element

XML (Spainhour, Ray, W3Schools)

The top-level element • Only one• All other elements must be nested within it

In an XHTML document, the root element is <html>

9

Tag Names

There are no predefined tag names; you must invent your own (or use tags that another developer invented)

Should be descriptive, so that the document can be self-describing

Should be short and concise

Can contain letters, numbers, and other characters

Must not start with a number or punctuation character, including the dollar sign, caret, percent symbol, semicolon, etc.

Must not start with the letters “xml”

Cannot contain spaces

Should not contain the characters “:” or “.”

XML (Spainhour, Ray, W3Schools)

10

Element Content

The text between the start tag and end tag

Content can be any of the following:Empty, without content

Nested elements

Character data

Character entities

Processing instructions

Comments

CDATA sections

XML (Spainhour, Ray, W3Schools)

<br />

<sandwich name="Shrimp Poorboy"> <price>5.99</price><sandwich>

&lt; &gt; &amp; &quot; &apos;

<?xml-stylesheet type="text/xsl" href="simple.xsl"?>

<!-- This is a comment -->

<?xml version="1.0" encoding="UTF-8" ?><sandwiches>

<sandwich name="BLT"> <price>5.99</price> <ingredients> <![CDATA[ Bacon, lettuce, & tomato ]]> </ingredients> <sandwich></sandwiches >

11

CDATA

Can be inserted anywhere that character data can occur All characters within a CDATA section are treated as a literal

part of the character data

Begins with these special characters

All characters within are treated as literals and are not parsed as XML

XML (Spainhour, Ray, W3Schools)

Ends with these special characters

12

<sandwich name="Poorboy"/>

Attributes

Attribute

Name-value pair that describes a property of the element

Can be included in the start tag or an empty tag

A particular attribute can appear only once in the same tag

XML (Spainhour, Ray, W3Schools)

13

Validation

A DTD describes your XML markup language Which tags can be used What each element can contain

A document can be tested with the DTD, and if it passes then it is valid• Must be well-formed

• Must be free of mistakes‒ No misspelled tag names

‒ No improper nesting

‒ No missing elements

Important when used by software that expects a particular document structure; and when separate groups of people need to agree on a common language for data exchange

XML (Spainhour, Ray, W3Schools)

14

DTD

Defines the structure or grammar of an XML document by describing your markup language

Used to test whether the XML document is valid

Can be internal or external

Can contain the following types of markup declarations• ELEMENT – the XML elements

• ATTLIST – attributes of the elements

• ENTITY – characters referenced using the “&...;” syntax

• NOTATION – description of the data format

• Processing instructions

• Comments

XML (Yue, Spainhour, Ray, W3Schools)

15

DTD Example

If we want to maintain a phone list as an XML document, the DTD might look like the following:

XML (Yue, Spainhour, Young, W3Schools)

<!ELEMENT phonelist (person)*>

<!ELEMENT person (name,phonenumber)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT phonenumber (areacode,number)>

<!ELEMENT areacode (#PCDATA)>

<!ELEMENT number (#PCDATA)>

This DTD defines a phone list that contains the name, area code and phone number of each person in the list.

16

Element Declarations

ELEMENT’s are the “building blocks” of an XML document.

<!ELEMENT phonelist (person)*>

<!ELEMENT person (name,phonenumber)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT phonenumber (areacode,number)>

<!ELEMENT areacode (#PCDATA)>

<!ELEMENT number (#PCDATA)>

The first line declares that a “phonelist” element has element content, and it can contain zero or more “person” child elements.

<!ELEMENT phonelist (person)*>

Begins the element declaration

Tag name of this element

Content can be zero or more “person” elements

* Zero or more

+ One or more

? Zero or one

These three characters can be used to specify the

number of elements

Ends the element declaration

XML (Yue, Spainhour, Young, W3Schools)

Element Declarations

17

<!ELEMENT phonelist (person)*>

<!ELEMENT person (name,phonenumber)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT phonenumber (areacode,number)>

<!ELEMENT areacode (#PCDATA)>

<!ELEMENT number (#PCDATA)>

The second line declares that a “person” element has element content, and it must contain exactly one of each of the elements “name” and “phonenumber,” in that order.

<!ELEMENT person (name,phonenumber)>

Tag name of this element

When there are multiple child elements with commas separating the names, then the child elements must appear in that specific sequence

XML (Yue, Spainhour, Young, W3Schools)

Element Declarations

18

<!ELEMENT phonelist (person)*>

<!ELEMENT person (name,phonenumber)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT phonenumber (areacode,number)>

<!ELEMENT areacode (#PCDATA)>

<!ELEMENT number (#PCDATA)>

The third line declares that the content of the “name” element is simple character data.

<!ELEMENT name (#PCDATA)>

Tag name of this element

“PCDATA” stands for “parsed character data,” text that will be parsed by the XML parser. Tags inside the text will be treated as markup and entities will be expanded. It can also be empty.

XML (Ding,Yue, Young, W3Schools)

Element Declarations

19

<!ELEMENT phonelist (person)*>

<!ELEMENT person (name,phonenumber)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT phonenumber (areacode,number)>

<!ELEMENT areacode (#PCDATA)>

<!ELEMENT number (#PCDATA)>

What can you say about the next three declarations?

XML (Yue, Spainhour, Young, W3Schools)

Element Declarations

20

<!ELEMENT phonelist (person)*>

<!ELEMENT person (name,phonenumber)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT phonenumber (areacode,number)>

<!ELEMENT areacode (#PCDATA)>

<!ELEMENT number (#PCDATA)>

Is the following XML document valid, according to this DTD?

<?xml version="1.0"

encoding="UTF-8"?>

<phonelist>

<person>

<name>Charles Moen</name>

<phonenumber>

<areacode>281</areacode>

<number>283-3848</number>

</phonenumber>

</person>

</phonelist>

XML (Yue, Spainhour, Young, W3Schools)

Using an External DTD

21

<!ELEMENT phonelist (person)*>

<!ELEMENT person (name,phonenumber)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT phonenumber (areacode,number)>

<!ELEMENT areacode (#PCDATA)>

<!ELEMENT number (#PCDATA)>

Use the DOCTYPE instruction to connect the xml document with an external DTD

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE phonelist SYSTEM "phonelist.dtd">

<phonelist> <person> <name>Charles Moen</name> <phonenumber> <areacode>281</areacode> <number>283-3848</number> </phonenumber> </person></phonelist>

XML (Yue, Spainhour, Young, W3Schools)

phonelist.dtd

phonelist.xml

<!DOCTYPE phonelist SYSTEM "phonelist.dtd">

The root element

Describes the location of the DTD, and can be relative or fully qualified, such as:"http://sce.uhcl.edu/moenc/dtds/phonelist.dtd"

Either SYSTEM or PUBLIC (if PUBLIC, then must be followed by both a name and URI)

Using an Internal DTD

An internal DTD is placed in the DOCTYPE instruction of the XML document.

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE phonelist [ <!ELEMENT phonelist (person)*>

<!ELEMENT person (name,phonenumber)> <!ELEMENT name (#PCDATA)> <!ELEMENT phonenumber (areacode,number)> <!ELEMENT areacode (#PCDATA)> <!ELEMENT number (#PCDATA)> ]>

<phonelist> <person> <name>Charles Moen</name> <phonenumber> <areacode>281</areacode> <number>283-3848</number> </phonenumber> </person></phonelist>

XML (Yue, Spainhour, Young, W3Schools)

phonelist.xml

23

More about Element Declarations

ELEMENT content can be specified in several forms.

<!ELEMENT phonelist (listitem)*>

<!ELEMENT listitem (person | department)>

<!ELEMENT department (name,phonenumber)>

<!ELEMENT person (name,phonenumber)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT phonenumber (areacode,number)>

<!ELEMENT areacode (#PCDATA)>

<!ELEMENT number (#PCDATA)>

The “choice” form specifies a series of possible child elements

XML (Yue, Spainhour, Young, W3Schools)

The “sequence” form specifies a required sequence of child elements

<!ELEMENT misc ANY>

The “ANY” keyword means the element can have any legal content,

in any order.<!ELEMENT br EMPTY>

The “EMPTY” keyword means the element must have no content.

24

Attribute-List Declarations

All attributes must be explicitly declared with an “ATTLIST” declaration.

<!ELEMENT phonelist (listitem)*>

<!ELEMENT listitem (person | department)>

<!ELEMENT department (name,phonenumber)>

<!ELEMENT person (name,phonenumber)>

<!ATTLIST person title CDATA "Dr" #Required>

<!ELEMENT name (#PCDATA)>

<!ELEMENT phonenumber (areacode,number)>

<!ELEMENT areacode (#PCDATA)>

<!ELEMENT number (#PCDATA)>

Here, the “title” attribute is required; it must be CDATA; and it defaults to “Dr”.

XML (Yue, Spainhour, Young, W3Schools)

<!ATTLIST person title (Dr|Ms|Mr) "Dr">

Here, the “title” attribute is not required; it must be one of the three values that are enumerated; and it defaults to “Dr”.

XML Namespaces

25

<?xml version="1.0" encoding="UTF-8"?>

<uhcl:courses xmlns:uhcl="http://www.uhcl.edu/ns">

<uhcl:course>

<uhcl:title>Charles Moen</uhcl:title>

<uhcl:rubric>CSCI/CINF</uhcl:rubric>

<uhcl:number>4230</uhcl:number>

</uhcl:course>

</uhcl:courses>

XML (Yue, Spainhour, Young, W3Schools)

We can be sure that there is no conflict with element names by using a namespace.

The namespace must be declared before using it, and the declaration is often in the root element.

The identifier must be unique, and is usually a URL. (The URL does not have to be a valid URL of a Web page.)

The qualified element name consists of the namespace, followed by a colon, followed by the local name.

Just for Fun

XSL

26

XML

An XSL (Extensible Stylesheet Language) document can be used to transform the data in an XML document to an HTML document, or a document in some other format.

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="simple.xsl"?>

<phonelist> <person> <name>Charles Moen</name> <phonenumber> <areacode>281</areacode> <number>283-3848</number> </phonenumber> </person></phonelist>

<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head> <title>Demo XSL</title> </head> <body> <h1>Phone List</h1> <table border="1" cellspacing="0" cellpadding="5" width="480"> <tr><th>Name</th><th>Phone number</th></tr> <xsl:apply-templates select="phonelist/person"/> </table> </body> </html> </xsl:template>

<xsl:template match="person"> <tr> <td><xsl:value-of select="@title"/>&#160;<xsl:value-of select="name"/></td> <td> (<xsl:value-of select="phonenumber/areacode"/>)&#160;<xsl:value-of select="phonenumber/number"/> </td> </tr> </xsl:template></xsl:stylesheet>

<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head> <title>Demo XSL</title> </head> <body> <h1>Phone List</h1> <table border="1" cellspacing="0" cellpadding="5" width="480"> <tr><th>Name</th><th>Phone number</th></tr> <xsl:apply-templates select="phonelist/person"/> </table> </body> </html> </xsl:template>

<xsl:template match="person"> <tr> <td><xsl:value-of select="@title"/>&#160;<xsl:value-of select="name"/></td> <td> (<xsl:value-of select="phonenumber/areacode"/>)&#160;<xsl:value-of select="phonenumber/number"/> </td> </tr> </xsl:template></xsl:stylesheet>

The XSL must be linked to the XML

27

References

Ding, Wei, “XML” UHCL lecture slides, 2008.

Ray, Erik T. Learning XML. O'Reilly, 2001.

Spainhour, Stephen and Robert Eckstein. Webmaster in a Nutshell, 3rd Edition. O'Reilly, 2002.

W3Schools Online Web Tutorials. “DTD Tutorial". [Online]. Available: http://www.w3schools.com/dtd/default.asp

W3Schools Online Web Tutorials. "XML Tutorial". [Online]. Available: http://www.w3schools.com/xml/default.asp

Young, Michael J., XML Step by Step. Microsoft Press, 2000.

Yue, Kwok-Bun, “An Introduction to XML” UHCL lecture notes, 2001.