xml study-session: part ii validating xml documents

22
XML Study-Session: Part II Validating XML Documents

Post on 19-Dec-2015

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XML Study-Session: Part II Validating XML Documents

XML Study-Session: Part II

Validating XML Documents

Page 2: XML Study-Session: Part II Validating XML Documents

Objectives:

By completing this study-session, you should be able to:

Validate XML documents against a DTD. Understand basic DTD syntax. Create simple DTDs of your own.

Page 3: XML Study-Session: Part II Validating XML Documents

What is a DTD?

Document Type Definition:

Standard originally developed for SGML. Provides a description of the XML document’s

structure, and serves as a grammar to specify what tags and attributes are valid in an XML document and in what context they are valid.

E.g. The following is an example DTD statement:

<!ELEMENT person (name, e-mail*)>

Page 4: XML Study-Session: Part II Validating XML Documents

Why use a DTD?

DTDs are used to allow an application to construct valid XML that conforms to that specification. Also:

Self documentation Portability Provides defaults for attributes Entity declaration

Page 5: XML Study-Session: Part II Validating XML Documents

Using a DTD in an XML documentAn XML document may do any of the following: Refer to a DTD, using its URI. Include a DTD inline as part of the XML document. Omit a DTD altogether. Without a DTD, an XML

document can be checked for well-formedness, but not for validity.

The DTD used by the XML document may be internal or external. An external DTD is stored as an ASCII text .dtd file.

Page 6: XML Study-Session: Part II Validating XML Documents

Example: Using a DTD inline<?xml version=‘1.0’ encoding=‘UTF-8’?><!DOCTYPE Book [<!ELEMENT Book (Title, Author+, Summary*, Note?)><!ATTLIST Book

ISBN CDATA #REQUIREDsection (fiction|nonfiction) ‘fiction’>

<!ELEMENT Title (#PCDATA)><!ELEMENT Author (#PCDATA)><!ELEMENT Summary (#PCDATA)><!ENTITY Description ‘A great American novel.’>]>

<Book ISBN=‘1234’><Title> To Kill a Mockingbird </Title><Author> Harper Lee </Author><Summary> &Description; </Summary></Book>

Page 7: XML Study-Session: Part II Validating XML Documents

Doctype declaration

The Document Type (Doctype) declaration is used to indicate the DTD used for the document. Syntax may be in any of the following forms:

<!DOCTYPE rootname [DTD]> <!DOCTYPE rootname SYSTEM URL> <!DOCTYPE rootname SYSTEM URL [DTD]> <!DOCTYPE rootname PUBLIC identifier URL> <!DOCTYPE rootname PUBLIC identifier URL [DTD]>

Page 8: XML Study-Session: Part II Validating XML Documents

Example: External DTD

The following is an example of an XML document that uses an external DTD:

<?xml version=‘1.0’ standalone=‘no’?>

<!DOCTYPE Book SYSTEM ‘booklist.dtd’>

<Book ISBN=‘4576’>

<Title> Moby Dick </Title>

<Author> Herman Melville </Author>

</Book>

The external DTD must be located in the same directory as the XML document.

Page 9: XML Study-Session: Part II Validating XML Documents

Example: Using DTDs with URLS

The following is an example of an XML document that references an external DTD with an URL:

<?xml version=‘1.0’ standalone=‘no’?>

<!DOCTYPE Book SYSTEM http://www.somewebsite.com/booklist.dtd>

<Book ISBN=‘4576’>

<Title> Moby Dick </Title>

<Author> Herman Melville </Author>

</Book>

Page 10: XML Study-Session: Part II Validating XML Documents

Specifying Elements

In the DTD, this is done with the notation:

<!ELEMENT elemName elemDefinitionOrType>

where elemName is the actual element name, and elemDefinitionOrType indicates whether the content of the content is pure data or a compound type of data and other elements.

Page 11: XML Study-Session: Part II Validating XML Documents

Some Element Types

The element type keyword ANY allows the element to contain textual data, nested elements, or any legal XML combination of the two.

The element type keyword #PCDATA indicates textual data, and can be used to store regular character data we want the XML document to handle normally.

The element type keyword EMPTY indicates that the element is always empty.

Page 12: XML Study-Session: Part II Validating XML Documents

Nesting elements

To define the allowed nestings within a DTD, the following notation is used:

<!ELEMENT elemName (nestedElem, nestedElem, …)>

where the order of elements is enforced as a validity constraint within an XML document.

By default, an element can appear exactly once when specified without any modifiers in the DTD.

Page 13: XML Study-Session: Part II Validating XML Documents

Recurrence Operators:

Recurrence operators can be used to indicate how many times an element must appear in an XML document:

Operator Description

[Default] Must appear exactly one time.

? Must appear once or not at all.

+ Must appear at least once (1..N times).

* May appear any number of times (0..N times).

Page 14: XML Study-Session: Part II Validating XML Documents

Grouping elements

Often, recurrence occurs for a block or group of elements rather than with a single element.

To signify a group, enclose a set of elements within parantheses. Nested parentheses are acceptable.

In this way, a recurrence operator can then be applied to the group.

E.g.

<!ELEMENT groupingExample ((group1Elem1, group1Elem2)+, (group2Elem1, group2Elem2)?)+>

Page 15: XML Study-Session: Part II Validating XML Documents

Either Or

In the DTD, an “OR” operator is signified by using |. This allows one thing or the other to occur, and can be used in conjunction with groupings.

E.g.

<!ELEMENT aggregateElement (#PCDATA|Element1|Element2)*>

Page 16: XML Study-Session: Part II Validating XML Documents

Defining Attributes Attribute definitions are in the following form:

<!ATTLIST enclosingElement attributName attributeType attributeModifier …>

The attributeType keyword CDATA allows an attribute to take on any value, and may represent a comment or additional information about an element.

Another attribute type is an enumeration, where any of the specified values may be used, but any other value for the attribute results in an invalid document.

E.g.

<!ATTLIST elementName attribuetName (value1|value2) attributeModifier …>

Page 17: XML Study-Session: Part II Validating XML Documents

Attribute Modifiers

We can indicate in the attribute definition whether the attribute is required within an element.

The three modifier keywords are: #IMPLIED, #REQUIRED, and #FIXED.

An implied attribute may be given a value, or left unspecified.

A required attribute must be given a value. A fixed attribute has a specified value that can never

change. The notation for this is:<!ATTLIST elementName attributName #FIXED fixedValue>

Page 18: XML Study-Session: Part II Validating XML Documents

Parameter Entities in DTDs Parameter entities are entities that can only be used in the DTD. A simple internal parameter entity has the format:

<!ENTITY % name definition> E.g.

<?xml version=‘1.0’ standalone=‘yes’><!DOCTYPE Book [<!ENTITY % sum “<!ELEMENT Summary (#PCDATA)>”><!ELEMENT Book (Title, Author+, Summary*, Note?)><!ELEMENT Title (#PCDATA)><!ELEMENT Author (#PCDATA)>%sum;]>…

Page 19: XML Study-Session: Part II Validating XML Documents

Parameter Entities in DTDs (contd.) External parameter entitites can be declared using the following:

<!ENTITY % name SYSTEM URI> or

<!ENTITY % name PUBLIC identifier URI> E.g. The following ‘orders.dtd’ file could be created:

<!ENTITY % record "(Name, Date, Orders)">

<!ELEMENT Store (Customer|Buyer|Supplier)*>

<!ELEMENT Customer %record;>

<!ELEMENT Buyer %record;>

<!ELEMENT Supplier %record;>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Date (#PCDATA)>

<!ELEMENT Orders (Product|Price)>

<!ELEMENT Product (#PCDATA)>

<!ELEMENT Price (#PCDATA)>

<!ENTITY % XHTML1 –t.dtd PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>

%XHTML1-t.dtd

Page 20: XML Study-Session: Part II Validating XML Documents

Using INCLUDE and IGNORE We can customize our DTDs using the INCLUDE and IGNORE

statements, which have the following syntax:

<![INCLUDE [DTD sections]]>

<![IGNORE [DTD sections]]>

E.g. In the ‘orders.dtd’ file, add the following lines:

<!ENTITY % includer “INCLUDE”>

…(same as before)…

<![includer; [

<ELEMENT Product_ID (#PCDATA)>

<ELEMENT Ship_Date (#PCDATA)>

<ELEMENT Tax (#PCDATA)>

]]>

Page 21: XML Study-Session: Part II Validating XML Documents

Example: Using the XHTML 1.1 DTD The XHTML 1.1 DTD is a DTD driver which includes

various XHTML 1.1 modules (i.e. DTD sections) using parameter entities.

E.g.

<!--Tables Module……………………………--><ENTITY % xhtml-table.module “INCLUDE”><![%xhtml-table.module;[<ENTITY % xhtml-table.mod PUBLIC “-//W3C//ELEMENTS XHTML 1.1 Tables 1.0//EN” “xhtml11-table-1.mod”>%xhtml-table.mod;]]>

The above allows us to customize the XHTML 1.1 DTD to include/exclude support for tables.

Page 22: XML Study-Session: Part II Validating XML Documents

Next session:

Parsing XML Documents Parsing techniques Writing your own XML applications