introduction to xml. what is xml? extensible markup language xml 1.0 1998 easier-to-use subset of...

31
Introduction to XML

Upload: rhoda-lane

Post on 29-Dec-2015

233 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to XML

What is XML?

• Extensible Markup Language XML 1.0 1998• Easier-to-use subset of SGML (Standard

Generalized Markup Language)• XML is a text-based markup language• Standard for data interchange on the web• Set of rules for designing semantic tags• Meta-markup language to define other

languages• XML 1.0 Specification

http://www.w3.org/TR/REC-xml

XML File Sample

<?xml version="1.0"?> <dining-room> <manufacturer>The Wood Shop</manufacturer> <table type="round" wood="maple"> <price>$199.99</price> </table> <chair wood="maple"> <quantity>6</quantity> <price>$39.99</price> </chair> </dining-room>

HTML Example

<DL> <DT>Mambo <DD>by Enrique Garcia </DL> <UL> <LI>Producer: Enrique Garcia <LI>Publisher: Sony Music Entertainment <LI>Length: 3:46 <LI>Written: 1991 <LI>Artist: Azucar Moreno </UL>

XML Describes Structure and Semantics, Not Format

<SONG> <TITLE>Mambo</TITLE> <COMPOSER>Enrique Garcia</COMPOSER> <PRODUCER>Enrique Garcia</PRODUCER> <PUBLISHER>Sony Music Entertainment</PUBLISHER> <LENGTH>3:46</LENGTH> <YEAR>1991</YEAR> <ARTIST>Azucar Moreno</ARTIST> </SONG>

Self-Decribing Data

<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT> <GREETING>Hello from XML</GREETING> <MESSAGE>Welcome to Programing XML in Java</MESSAGE> </DOCUMENT>

Structured and Integrated Data

<?xml version="1.0"?> <SCHOOL> <CLASS type="seminar"> <CLASS_TITLE>XML In The Real World</CLASS_TITLE> <CLASS_NUMBER>6.031</CLASS_NUMBER> <SUBJECT>XML</SUBJECT> <START_DATE>6/1/2002</START_DATE> <STUDENTS> <STUDENT status="attending"> <FIRST_NAME>Edward</FIRST_NAME> <LAST_NAME>Samson</LAST_NAME> </STUDENT> <STUDENT status="withdrawn"> <FIRST_NAME>Ernestine</FIRST_NAME> <LAST_NAME>Johnson</LAST_NAME> </STUDENT> </STUDENTS> </CLASS> </SCHOOL>

Creating XML Documents

• HTML, about 100 elements• XML, you define your own elements• HTML Browsers try to fix bad HTML code• XML Processors do not make any guess

about the structure of the document• Well-formed XML Document is the minimal

requirement• Valid XML Document (DTD or XML Schema)

What is a well-formed XML Doc?

• A textual object is a well-formed XML Document if:– Taken as a whole, it matches the production

labeled document– It meets all the well-formedness contraints given in

this specification:http://www.w3.org/TR/REC-xml– Each of the parsed entities which is referenced

directly or indirectly whitin the document is well-formed

document ::= prolog element Misc*

• Prolog: ・– <?xml version="1.0"?>– Comments -> <!-- This is a Comment -->– Processing Instructions:<?xml-stylesheet

href="JavaXML.html.xsl" type="text/xsl"?><?xml-stylesheet href="greeting.css" type="text/css"?>

• Element:– Root Element contains more elements– Exactly one root element

• Misc:– Comments– Processing Instructions– Whitespaces

Entities

• Part of an XML Document

• Hold text or binary data

• May refer to other entities

• Parsed entities are character data

• Unparsed entities are binary data

Tags and Elements

• XML Element consists of a start tag and an end tag<document> ... </document>

• Tag Names– Start with a letter <document>, an underscore

<_record> or a colon (avoid using a colon)– Next characters may be letters, digits, underscore,

hyphens, periods and colons (but no whitespaces)– XML Processors are case sensitiveDifferent tags:

<document>, <DOCUMENT>, <Document>– Empty Elements have only one tag:HTML :

<img>, <li>, <hr>XHTML : <img/>, <li/>, <hr/>

Attributes

• Name-value pairs: {STATUS, "Good Credit"}• Specify additional data in start tags

<CUSTOMER STATUS="Good credit">• Attribute Names same rules as tag names• Attribute Values are strings enclosed in

quotation marks

Too many attributes make documents hard to read:

<CUSTOMER LAST_NAME="Smith" FIRST_NAME="Sam" DATE="October 15, 2001" PURCHASE="Tomatoes" PRICE="$1.25" NUMBER="8" /> <CUSTOMER> <NAME> <LAST_NAME>Smith</LAST_NAME> <FIRST_NAME>Sam</FIRST_NAME> </NAME> <DATE>October 15, 2001</DATE> <ORDERS> <ITEM> <PRODUCT>Tomatoes</PRODUCT> <NUMBER>8</NUMBER> <PRICE>$1.25</PRICE> </ITEM> </ORDERS> </CUSTOMER>

CDATA

• Hold character data that remains unparsed by the XML Processor

• Start a CDATA section: <![CDATA[

• End a CDATA section: ]]>

<?xml version="1.0"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/tr/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Using The if Statement In JavaScript</title> </head> <body> <script language="javascript">

<![CDATA[ var budget budget = 234.77 if (budget < 0){ document.writeln("Uh oh.")}

]]> </script> <center> <h1>Using The if Statement In JavaScript</h1> </center> </body></html>

Document Type Definition:• Specify Structure and Syntax of XML Document

– <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>

<?xml version="1.0"?><!DOCTYPE BOOK [ <!ELEMENT BOOK (P*)> <!ELEMENT P (#PCDATA)>]>

<BOOK> <P>chapter 1 - Intro</P> <P>chapter 2 - Conclusion</P> <P>Index</P></BOOK>

Creating Document Type Declarations

• <!DOCTYPE rootname [DTD]>• <!DOCTYPE rootname SYSTEM URL>• <!DOCTYPE rootname SYSTEM URL

[DTD]>• <!DOCTYPE rootname PUBLIC identifier

URL>• <!DOCTYPE rootname PUBLIC identifier

URL [DTD]>

Element Definition

• <!ELEMENT direction (left, right, top?)>• <!ELEMENT CHAPTER (INTRODUCTION,

(P | QUOTE | NOTE)*, DIV*)>• <!ELEMENT HR EMPTY> ・ <!ELEMENT p

(#PCDATA | I)* >• <!ELEMENT %title; %content; >• <!ELEMENT DOCUMENT ANY>

Content_model

• ANY– Any type of content - Elements or PCDATA<!

ELEMENT DOCUMENT ANY>

• Child Element Lists– Name of elements in parentheses<!ELEMENT

direction (left, right, top?)>

• #PCDATA (Parsed Character Data)– Nonmarkup text<!ELEMENT First_Name

(#PCDATA)>

Example 1:<!ELEMENT PRODUCT (#PCDATA | PRODUCT_ID)*>

<PRODUCT>Tomatoes</PRODUCT>

<PRODUCT> <PRODUCT_ID>124829548702121</PRODUCT_ID></PRODUCT>

Example 2:<!ELEMENT p (#PCDATA | b)*><!ELEMENT b (#PCDATA)>

<p>This is <b>bold</b> text</p>

Entities

• XML's way of referring to a data item.• Text or Binary data.• General Entity

– Use in the content of XML document– References start with '&' and end with ';’

• Parameter Entity– Use in a DTD– References start with '%' and end with ';’

• Internal Entity - Defined in XML Document• External Entity - Defined in a external source: file,

URI.

<!ENTITY name definition>

Example 1: <!ELEMENT DATE (#PCDATA)> <!ENTITY TODAY "February 7, 2001">

<DATE>&TODAY;</DATE>Example 2: <!ENTITY NAME "John Punin"> <!ENTITY CNAME "&NAME; Palacios">

Namespaces

• XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references.

• Definition: A namespace (or more precisely, a namespace binding) is declared using a family of reserved attributes. Such an attribute's name must either be xmlns or begin xmlns:. These attributes, like any other XML attributes, may be provided directly or by default.

XML Document with one namespace

• Namespace is defined by xmlns:prefix• prefix is used for the namespace• The xmlns:prefix attribute is assigned to a URI. A

Uniform Resource Identifier (URI) is a string of characters which identifies an Internet Resource.

• Every tag is prefaced with the prefix name <?xml version="1.0"?> <!-- both namespace prefixes are available throughout --> <bk:book xmlns:bk='http://www.books.org/books'> <bk:title>Programing XML in Java</bk:title> </bk:book>

Using Namespaces

This XML document carries information in a table:

<h:table xmlns:h="http://www.w3.org/TR/html4/"> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr></h:table>This XML document carries information about a piece of furniture:

<f:table xmlns:f="http://www.w3schools.com/furniture"> <f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length></f:table>Instead of using only prefixes, we have added an xmlns attribute to the <table> tag to give the prefix a qualified name associated with a namespace.

XML Schema

• To define a "class" of XML Documents

• "instance document" - XML document that conforms to a particular schema

• An XML alternative to DTDs

A Simple XML Document

Look at this simple XML document called "note.xml":

<?xml version="1.0"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>

A DTD File

The following example is a DTD file called "note.dtd" that defines the elements of the XML document above ("note.xml"):

<!ELEMENT note (to, from, heading, body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>The first line defines the note element to have four child elements: "to, from, heading, body".

Line 2-5 defines the to, from, heading, body elements to be of type "#PCDAT

An XML Schema

The following example is an XML Schema file called "note.xsd" that defines the elements of the XML document above ("note.xml"):

<?xml version="1.0"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"targetNamespace="http://www.w3schools.com"xmlns="http://www.w3schools.com"elementFormDefault="qualified"><xs:element name="note"> <xs:complexType> <xs:sequence>

<xs:element name="to" type="xs:string"/><xs:element name="from" type="xs:string"/><xs:element name="heading" type="xs:string"/><xs:element name="body" type="xs:string"/>

</xs:sequence> </xs:complexType></xs:element></xs:schema>The note element is a complex type because it contains other elements. The other elements (to, from, heading, body) are simple types because they do not contain other elements.