tutorial 3: xml creating a valid xml document. 2 creating a valid document you validate documents to...
TRANSCRIPT
Tutorial 3: XML
Creating a Valid XML Document
2
Creating a Valid Document
You validate documents to make certain necessary elements are never omitted.
For example, each customer order should include a customer name, address, and phone number.
A document is validated to prevent errors in their content or structure.
An XML document can be validated using either DTDs (Document Type Definitions) or schemas.
Customer orders table
<customers> <customer custID="cust201" custType="home">
<name title="Mr.">David Lynn</name><address> <![CDATA[ 211 Fox Street Greenville, NH 80021 ]]></address><phone>(315) 555-1812</phone><email>[email protected]</email><orders> <order orderID="or10311" orderBy="cust201"> <orderDate>8/1/2008</orderDate>
<items> <item itemPrice="599.95">DCT5Z</item>
<item itemPrice="199.95">SM128</item> <item itemPrice="29.95" itemQty="2">RCL</item>
</items></order>
</customer>
The structure of the orders.xml document
DTD statements are inserted
here
the root element of the document must
match the root element listed in the
DOCTYPE declaration
<!DOCTYPE customers[
]><customers>
Writing the Document Type Declaration
the root element
7
Declaring a DTD
A DTD can be used to: ensure all required elements are present prevent undefined elements from being used enforce a specific data structure specify the use of attributes and define their
possible values define default values for attributes describe how the parser should access non-
XML or non-textual content
8
Declaring a DTD
A document type definition is a collection of rules or declarations that define the content and structure of the document.
A document type declaration attaches those rules to the document’s content.
You create a DTD by first entering a document type declaration into your XML document.
9
Declaring a DTD
While there can only be one DTD per XML document, it can be divided into two parts: an internal subset and an external subset.
An internal subset is declarations placed in the same file as the document content.
An external subset is located in a separate file.
10
Declaring a DTD
To declare an internal DTD subset, use:
<!DOCTYPE root
[
declarations
]> Where root is the name of the document’s root
element, and declarations are the statements that comprise the DTD.
To declare an external DTD subset with a system or public location, use:
<!DOCTYPE root SYSTEM “uri”>
or
<!DOCTYPE root PUBLIC “id” “uri”>
id is a text string that tells an application how to locate the external subset
uri is the location and filename of the external subset
Unless your application requires a public identifier, you should use the SYSTEM location form.
A DOCTYPE declaration can indicate both an external and an internal subset. The syntax is:
<!DOCTYPE root SYSTEM “uri” [ declarations ]>
or
<!DOCTYPE root PUBLIC “id” “uri” [ declarations ]>
13
Declaring a DTD
The real power of XML comes from an external DTD that can be shared among many documents.
If a document contains both an internal and an external subset, the internal subset takes precedence over the external subset if there is a conflict between the two.
This way, the external subset would define basic rules for all the documents, and the internal subset would define those rules specific to each document.
Combining an External and Internal DTDs
15
Declaring Document Elements
In a valid document, every element must be declared in the DTD.
An element (type) declaration specifies the name of the element and indicates what kind of content the element can contain. The syntax is:
<!ELEMENT element content-model> Where element is the name of the element and content-
model specifies what type of content the element contains.
The element name is case sensitive
16
Five different types of element contentfor content-model
ANY - No restrictions on the element’s content. EMPTY - The element cannot store any content. #PCDATA - The element can only contain parsed
character data. Elements - The element can only contain child
elements. Mixed - The element contains both parsed character
data and child elements.
17
ANY Content: The declared element can store any type of content
The syntax is: <!ELEMENT element ANY> Example: <!ELEMENT products ANY> Any of the following would satisfy the above
declaration:– <products>SLR100 Digital Camera</productgs>– <products>
<name>SLR100</name> <type>Digital Camera</type></products>
18
EMPTY content: This is reserved for elements that store no content
The syntax is: <!ELEMENT element EMPTY> Example: <!ELEMENT img EMPTY> The following would satisfy the above
declaration:– <img />
19
#PCDATA Content: can store parsed character data
The syntax is:
<!ELEMENT element (#PCDATA)> <!ELEMENT name (#PCDATA)>
would permit the following element in an XML document:– <name>Lea Ziegler</name>
PCDATA element does not allow for child elements
20
Working with Child Elements
The syntax is:
<!ELEMENT element (children)>– Where element is the parent element and
children is a listing of its child elements. The declaration <!ELEMENT customer (phone)>
indicates that the following would be invalid:<customer> <name>Lea Ziegler</name> <phone>555-2819</phone>
21
Working with Child Elements
To declare the order of child elements, use:
<!ELEMENT element (child1, child2, …)>– Where child1, child2, … is the order in which the
child elements must appear within the parent element.
Thus,
<!ELEMENT customer (name, phone, email)>indicates the customer element should contain three child elements named name, phone, email.
22
Working with Child Elements
To allow for a choice of child elements, use: <!ELEMENT element (child1 | child2 | …)>– where child1, child2, etc. are the possible child
elements of the parent element. <!ELEMENT customer (name | company)>
– allows the customer element to contain either the name element or the company element.
<!ELEMENT customer ((name | company), phone, email)>
23
Modifying Symbols
A modifying symbol specifies the number of occurrences of each element: – ? allows zero or one of the item.– + allows one or more of the item.– * allows zero or more of the item.
Modifying symbols can be applied within sequences or choices. They can also modify entire element sequences or choices by placing the character immediately following the closing parenthesis of the sequence or choice.
24
Modifying Symbols
<!ELEMENT customers (customer+)>indicates that the customers element must contain at least one element named customer.
<!ELEMENT order (orderDate, items)+> indicates that the child element sequence (orderDate, items) can be repeated one or more times within each order element.
<!ELEMENT customer (name, address, phone, email?, orders)>
allows the customer element to contain zero or one email elements.
25
Working with Mixed Content
Mixed content elements contain both parsed character data and child elements. The syntax is:
<!ELEMENT element (#PCDATA | child1 | child2 | … )*>
The parent element can contain character data or any number of the specified child elements, or it can contain no content at all.
It is better not to work with mixed content if you want a tightly structured document.
26
Declaring Element Attributes
For a document to be valid, all the attributes associated with elements must also be declared. To enforce attribution properties, you must add an attribute-list declaration to the document’s DTD.
Element Attributes Required? Default Value(s)
customer custID
custType
Yes
No
None
“home” or “business
name Title No “Mr.”, “Mrs.”, “Ms.”
order orderID
orderBy
Yes
Yes
None
none
item itemPrice
itemQty
Yes
No
None
“1”
Attributes used in orders.xml
28
Declaring Element Attributes
The attribute-list declaration: Lists the names of all attributes associated with a
specific element Specifies the data type of the attribute Indicates whether the attribute is required or
optional Provides a default value for the attribute, if
necessary
29
Declaring Element Attributes
The syntax to declare a list of attributes is:
<!ATTLIST element attribute1 type1 default1
attribute2 type2 default2
attribute3 type3 default3 … >– Where element is the name of the element
associated with the attributes, attribute is the name of an attribute, type is the attribute’s data type, and default indicates whether the attribute is required and whether it has a default value.
30
Declaring Element Attributes
Attribute-list declaration can be placed anywhere within the document type declaration, although it is easier if they are located adjacent to the declaration for the element with which they are associated.
31
Working with Attribute Types
Attribute values can consist only of character data, but you can control the format of those characters. Three general categories of attribute values are: CDATA can contain any character except those
reserved by XML Enumerated types are attributes that are limited to
a set of possible values Tokenized types are text strings that follow certain
rules for the format and content
32
CDATA
The syntax is:<!ATTLIST element attribute CDATA default>
Examples:<!ATTLIST item itemPrice CDATA><!ATTLIST item itemQty CDATA>
Any of the following attributes values are allowed:<item itemPrice=“29.95”> . . . </item><item itemPrice=“$29.95”> . . . </item><item itemPrice=“£29.95”> . . . </item>
33
Enumerated Types
The general form for an enumerated type is: <!ATTLIST element attribute (value1 | value2 |
value3 | …) default >where value1, value2, . . are allowed values
Under the declaration below:<!ATTLIST customer custType (home | business)
. . . >any custType attribute whose value is not “home” or “business” causes parsers to reject the document as invalid.
34
Working with Attribute Types
Another type of enumerated attribute is notation. It associates the value of the attribute with a <!NOTATION> declaration located elsewhere in the DTD. The notation provides information to the XML parser about how to handle non-XML data.
35
Tokenized Types are character strings that follow certain rules for format and content
To declare an attribute as a tokenized type, use: attribute token
DTDs support seven tokens: IDs, IDREF, IDREFS, NMTOKEN, NMTOKENS, ENTITY, ENTITIES
An ID is used when an attribute value must be unique within an document. For example:
<!ATTLIST customer custID ID . . . >– This ensures each customer will have a unique
ID.
36
IDREF token
IDREF token must have a value equal to the value of an id attribute. This enables an XML document to contain cross-references between one element and another.
<!ATTLIST element attribute IDREF default> <!ATTLIST order orderBy IDREF . . .>
37
Attribute Defaults
There are four possible defaults:– #REQUIRED: the attribute must appear with every
occurrence of the element.– #IMPLIED: The attribute is optional.– An optional default value: A validated XML parser
will supply the default value if one is not specified.– #FIXED: The attribute is optional but if one is
specified, it must match the default.
38
Validating a Document with SMLSpy
XMLSpy is an XML development environment created by Altova, which is used for designing and editing professional applications involving XML, XML Schema, and other XML-based technologies.
Install and use the XMLSpy Home Edition, a free application which can be downloaded from the Altova Web site at http://www.altova.com/
39
Introducing Entities
Entities are storage units for a document’s content. The most fundamental entity is the XML document itself and is known as the document entity. Entities can also refer to: a text string a DTD an element or attribute declaration an external file containing character or binary data
40
Working with Entities
Entities can be declared in a DTD. How to declare an entity depends on how it is classified. There are three factors involved in classifying entities: The content of the entity How the entity is constructed Where the definition of the entity is located
41
General Parsed Entities
To create an internal parsed entity, use:
<!ENTITY entity “value”>– Where entity is the name assigned to the entity and
value is the entity’s value. For example, to store the product description for the
Tapan digital camera, use:<!ENTITY DCT5Z “Tapan Digital Camera 5 Mpx –
zoom”> or <!ENTITY DCT5Z “<desc>Tapan Digital Camera 5 Mpx – zoom</desc>”>
42
General Parsed Internal Entities
After an entity is declared, it can be referenced anywhere within the document. The syntax is:
&entity For example,
<item>&DCT5Z</item>
is interpreted as
<item>Tapan Digital Camera 5 Mpx – zoom</item>
43
General Parsed External Entities
For longer text strings, it is preferable to place the content in an external file. To create an external parsed entity, use:
<!ENTITY entity SYSTEM “uri”> For example, in the declaration:
<!ENTITY DCT5Z SYSTEM “description.xml”an entity named “DCT5Z” gets its value from the description.xml file
<!ENTITY DCT5Z "Tapan DIgital Camera 5 Mpx - zoom">
<!ENTITY SM128 "SmartMedia 128MB Card">
<!ENTITY RCL "Rechargeable Lithium Ion Battery">
<!ENTITY BCE4L "Battery Charger 4pt Lithium">
<!ENTITY WBC500 "WebNow Webcam 500">
<!ENTITY RCA "Rechargeable Alkaline Batgtery">
<!ENTITY SCL4C "Linton Flatbed Scanner 4C">
Declare parsed entities in the codes.dtd file for the product codes in the orders.xml documentation
entity name entity value
45
Parameter Entities
Parameter entities are used to store the content of a DTD. For internal parameter entities, the syntax is:
<!ENTITY % entity “value”> For external parameter entities, the syntax is:
<!ENTITY % entity SYSTEM “uri”> Once a parameter has been declared, you can add a
reference to it within the DTD using:%entity
Using Parameter Entities to Combine Multiple DTDs
<!DOCTYPE customers[
.
<!ENTITY % itemCodes SYSTEM "codes.dtd"> %itemCodes;]><customers>
. <orders> <order orderID="or10311" orderBy="cust201"> <orderDate>8/1/2008</orderDate> <items> <item itemPrice="599.95">&DCT5Z</item> <item itemPrice="199.95">&SM128</item> <item itemPrice="29.95" itemQty="2">&RCL</item> </items> </order>
Add a parameter entity to the DTD within the orders.xmlfile to load the contents of the codes.dtd file
<orders> <order orderID="or10311" orderBy="cust201"> <orderDate>8/1/2008</orderDate> <items> <item itemPrice="599.95" itemQty="1">Tapan DIgital
Camera 5 Mpx – zoom</item> <item itemPrice="199.95" itemQty="1">SmartMedia
128MB Card</item> <item itemPrice="29.95" itemQty="2">Rechargeable
Lithium Ion Battery</item> </items> </order>
49
Parameter Entities
Parameter entity references can only be placed where a declaration would normally occur, such as an internal or external DTD.
Parameter entities used with an internal DTD do not offer any time or effort savings.
However, an external parameter entity can allow XML to use more than one DTD per document by combining declarations from multiple DTDs.
50
Unparsed Entities
You need to create an unparsed entity in order to reference binary data such as images or video clips, or character data that is not well formed.
The unparsed entity includes instructions for how the unparsed entity should be treated.
A notation is declared that identifies a resource to handle the unparsed data.
51
Unparsed Entities
For example, to create a notation named “audio” that points to an application recorder.exe:
<!NOTATION audio SYSTEM “recorder.exe”> Once the notation has been declared, you then
declare an unparsed entity that instructs the XML parser to associate the data to the notation.
52
Unparsed Entities
To take unparsed data in an audio file and assign it to an unparsed entity named “theme”, use:
<!ENTITY theme SYSTEM “overture.wav” NDATA audio>
Here, the notation is the audio notation that points to the recorder.exe file. This declaration does not tell the record.exe application to run the file but simply identifies for the XML parser what resource is able to handle the unparsed data.