extensible markup language lecture notes/xml 03.pdf · 2 document type definition (dtd) •a dtd is...
TRANSCRIPT
1
XML (3)Extensible Markup Language
Acknowledgements and copyrights: these slides are a result of combination of notes and slides with contributions from: MichaelKiffer, Arthur Bernstein, Philip Lewis, Hanspeter Mφssenbφck, Hanspeter Mφssenbφck, Wolfgang Beer, Dietrich Birngruber,
Albrecht Wφss, Mark Sapossnek, Bill Andreopoulos, DivakaranLiginlal, Anestis Toptsis, Addison Wesley, Microsoft AA.
They serve for teaching purposes only and only for the students that are registered in CSE4413 and should not be published as a book or in any form of commercial product, unless written permission is
obtained from each of the above listed names and/or organizations.
2
Document Type Definition (DTD)
• A DTDDTD is a grammar specification for an XML document
• DTDs are optional – don’t need to be specified
• If specified, DTD can be part of the document (at the top); or it can be given as a URL
• A document that conforms (i.e., parses) w.r.t. its DTD is said to be validvalid
3
XMLData + DTD<!-- XML Data--><a>
<b> Some </b><c> 100 </c><c> 101 </c>
</a>
<!-- XML Data--><a>
<b> Some </b><b> Thing </b>
</a>
Not Valid!DTD
<!ELEMENT a (b+, c?) ><!ELEMENT b (#PCDATA) ><!ELEMENT c (#PCDATA) >
Valid
4
What is a DTD ?
• Document Type Definition (DTD)• Defines the syntax, grammar and semantics • Defines the document structure
– What Elements, Attributes, Entities, etc are permitted?– How are the document elements related and structured?
• Referenced by or defined in XML documents, but it’s not XML!
• Enables validation of XML documents using an XML Parser
• Can be referenced to by more than one XML document
• DTD’s may reference other DTD’s
5
Schemas: DTD Example
• XML document (that conforms to DTD below)
• DTD schema:
<!DOCTYPE BOOK [<!ELEMENT BOOK (TITLE+, AUTHOR) ><!ELEMENT TITLE (#PCDATA) ><!ELEMENT AUTHOR (#PCDATA) >]>
<BOOK><TITLE>All About XML</TITLE><AUTHOR>Joe Developer</AUTHOR>
</BOOK>
6
DTD By Diagram
Customer
FName LName
Address
Address
Address
CustomerOrder
Orders
OrderNo ProductNo
ProductNo
ProductNo
OrderNo ProductNo
ProductNo
Person
Orders
Orders
7
DTD By Example
http://www.myco.com/dtd/order.dtd<?xml version = “1.0” encoding = “UTF-8” ?><!DOCTYPE CustomerOrder [
<!ELEMENT CustomerOrder (Customer, Orders*) >
<!ELEMENT Customer (Person, Address+) ><!ELEMENT Person (FName, LName) ><!ELEMENT FName (#PCDATA) ><!ELEMENT LName (#PCDATA) ><!ELEMENT Address (#PCDATA) ><!ATTLIST Address
AddrType ( billing | shipping | home ) “shipping” >
<!ELEMENT Orders (OrderNo, ProductNo+) ><!ELEMENT OrderNo (#PCDATA) ><!ELEMENT ProductNo (#PCDATA) >]>
0 or more times
Exactly 1 time
1 or more timesParsed Character
Data
Or (choice)
Default value
8
DTD syntax rules• , (comma) indicates strict sequence. For example,
<!ELEMENT CustomerOrder (Customer, Orders) > • | (pipe) indicates option. For example,
AddrType ( billing | shipping | home )• Cardinality: How many instances allowed
– ? Optional (may or may not appear)– * Zero or more– + One or more
<!ELEMENT Category (subcategory+ ) >• #PCDATA stands for parsed character data • Mixed Content can be indicated using #PCDATA. For example,
<!ELEMENT Idea (#PCDATA | product | service)* >Eg. <Idea>
<product> … </product>Some descriptive text included as pcdata
<service> …. </service><product> …. </product>
</Idea>
9
DTDs (cont’d)
• A DTD can be specified as part of a XML document:
<?xml version=“1.0” ?><!DOCTYPE Report [
… … …]><Report> … … … </Report>
• A DTD can be specified as a standalone file, and used inside a XML document.
<?xml version=“1.0” ?><!DOCTYPE Report http://foo.org/Report.dtd”><Report> … … … </Report>
10
DTD Example<!DOCTYPE Report [
<!ELEMENT Report (Students, Classes, Courses)><!ELEMENT Students (Student*)><!ELEMENT Classes (Class*)><!ELEMENT Courses (Course*)><!ELEMENT Student (Name, Status, CrsTaken*)><!ELEMENT Name (First,Last)><!ELEMENT First (#PCDATA)>… … …<!ELEMENT CrsTaken EMPTY><!ELEMENT Class (CrsCode,Semester,ClassRoster)><!ELEMENT Course (CrsName)>… … …<!ATTLIST Report Date CDATA #IMPLIED><!ATTLIST Student StudId ID #REQUIRED><!ATTLIST Course CrsCode ID #REQUIRED><!ATTLIST CrsTaken CrsCode IDREF #REQUIRED><!ATTLIST ClassRoster Members IDREFS #IMPLIED>
]>Exercise: Use the above DTD to write a conforming XML document of your
choice.
Zero or more
Has text content (PCDATAstands for Parsed Character
Data)
Empty element, no content. Neither text nor child elements.
Like <BR/> in HTML.(but it may have attributes).
Same attribute in different elements
11
DTD- ELEMENT content specifications
<!ELEMENT Tagnamecontent_specification>
• Content_Specification: – EMPTY: Neither text nor child elements associated
• like <BR/> or <Link url=‘http://abc.d’/> in HTML.– ANY: Content that does not violate well-formed syntax – MIXED: Mix of elements, #PCDATA, or text
12
DTD- Attribute Declarations<!ATTLIST elementname attrname Type DEFAULT>
Required always in an instance of this element
#REQUIRED
The default value of the attribute. If the attribute is absent the default value is assumed by the parser.
default value only
Attribute has a fixed value. If this attribute is absent then the default value is assumed by the parser
#FIXED default value
Optional in an instance of this element
#IMPLIED
MeaningDefault
Divakaran Liginlal
13
Examples: DTD- Attribute Declaration<!ATTLIST product Name CDATA #REQUIRED>
• A product must have a name and it is a character data string
<!ATTLIST product Life #IMPLIED>• A product may have a specified Life
<!ATTLIST product Life #FIXED “Not Known”>• By default the product’s Life = ‘Not Known’ and is
constant. If attribute Life is absent, then still the attribute Life will be assumed to have this value.
<!ATTLIST product Life “Not Known”>• By default the product’s Life = ‘Not Known’. If specified
then the specified value is used. Divakaran Liginlal
Type of attr. is character string
14
Attribute Types – CDATA & ID<!ATTLIST elementname attrname Type DEFAULT>
Eg: <!ATTLIST Idea id ID #REQUIRED> <!ATTLIST school id ID #REQUIRED> Format of specifying an ID = (Letter | '_' | ':') (Char)*<Idea id=‘L1013’> <!ATTLIST Course CrsCode ID #REQUIRED><Course CrsCode=“4413”>
Eg: <!ATTLIST Idea name CDATA #REQUIRED><Idea name=‘Bug Killer’>
<!ATTLIST education school CDATA #REQUIRED><education school=“York University”>
Character String. CDATA indicates that an attribute contains a simple character string of text.
CDATA
Unique Name (identifier) inside the document. Only one attribute of type ID can be assigned to a given element type. The value of the attribute (i.e., the ID) must be unique throughout the same XML document, i.e., it uniquely identifies an element in the XML document,
ID
MeaningType
15
Attribute Types IDREF<!ATTLIST elementname attrname Type DEFAULT>
Indirect reference to an ID type<!ATTLIST CrsTaken CrsCode IDREF #REQUIRED>
<CrsTaken CrsCode=‘4413’>. CrsCode refers to an element (Course) that has an attribute CrsCode with value ‘4413’.
Reference to an element with an ID attribute having the same value (but not necessarily the same name!) as this IDREF attribute. (i.e., the value of the IDREF attribute must match the value of an ID attribute elsewhere in the same XML document.)
IDREFMeaningType
16
Attribute Types IDREFS
<!ATTLIST elementname attrname Type DEFAULT>
<!ATTLIST ClassRoster Members IDREFS #IMPLIED><ClassRoster Members=“cs123456 cs234567 cs345678”> –student ids who are registered in some class. Ideally, these students should be listed with elements of type<!ATTLIST Student StudId ID #REQUIRED>, e.g., <Student StudId=“cs123456”><Student StudId=“cs234567 ”><Student StudId=“cs345678”>
Series of IDREFs delimited by whitespace
IDREFSMeaningType
Divakaran Liginlal
17
Limitations of DTDs
• DTDs do not support namespaces. All element names are global: can’t have one Name type for people and another for companies:
<!ELEMENT Name (Last, First)><!ELEMENT Name (#PCDATA)>
both cannot be in the same DTD
• Very limited assortment of data types (just strings)• Cannot express unordered contents conveniently.
For example, <!ELEMENT Report (Students, Classes, Courses)>
determines that Students, Classes, Courses should appear in the order specified and not any other order.
18
DTD validation• Once you have a DTD, you can create a
XML document from that DTD. • Then you (may) want to validate the
document against the DTD.• To do so you can write a program that
parses the document and tries to match it against the DTD (Difficult!), or
• Can use a DTD validation tool.
19
DTD validation - tools• XSV validator (W3C) :
– Free– http://www.w3.org/2001/03/webdata/xsv
• Brown University’s STG (Scholarly Technology Group) validator.– Free– http://www.stg.brown.edu/service/xmlvalid/
• XMLStarlet– Free– http://xmlstar.sourceforge.net/
• Search the web for more tools (there are many)…
20
XML Schema (XSD)• http://www.w3.org/2001/XMLSchema• Came to rectify some of the problems with DTDs• Advantages:
– Integrated with namespaces– Many built-in types– User-defined types– Powerful key and referential constraints– The schema itself is a XML document.
• Disadvantages:– much more complex than DTDs
21
XML Documents + XML Schema
<!-- Some XML Schema --><element name = “a" ><complexType> <sequence><element name=“b“
type=“string" minOccurs=“1"/>
<element name=“c" type="integer" maxOccurs="1" />
</sequence></complexType>
</element>
<!-- XML Data--><a>
<b> Some </b><c> 100 </c><c> 101 </c>
</a>
<!-- XML Data--><a>
<b> Some </b><b> Thing </b>
</a>
Not Valid!
Valid
22
Example
<Age>20009</Age>XML
<element name=“Age” type=“integer”/>XML Schema
<!ELEMENT Age (#PCDATA)>DTD
23
Motivation for XML Schemas• datatype capability
• For example you can define <price> element to hold an integer with a range of 0 to 12,000
• Datatypes compatible with those in databases– XML Schemas supports relatively many
datatypes• Can create your own datatypes
– Example: "This is a new type based on the string type and elements of this type must follow this pattern: ddd-dddd, where 'd' represents a digit".
24
Types•• Primitive typesPrimitive types: decimal, integer, boolean, string,
etc. (defined in XMLSchema namespace, http://www.w3.org/TR/xmlschema-2/#built-in-datatypes) – string – string type– boolean – boolean type– integer, decimal, float, double – number types– time, date, month, year, century, etc– date and time
types.
All the above used as in xsd:type, e.g., xsd:integer.e.g.: <xsd:element name = “name” type = “xsd:string”/><xsd:attribute name=“retired” type=“xsd:boolean”/>
25
E.g.<name>Clive Lloyd</name><birthday>1950-03-29</ birthday >
XML Schema definition<xsd:element name=“name” type=“xsd:string”/><xsd:element name=“birthday” type=“xsd:date”/>
The name of this element
The type of this element
26
Custom types (user defined)
• XSD allows you to create your own custom primitive (simple) datatypes.
• Example:<xsd:simpleType name=“TenToTwentyType”>
<xsd:restriction base=“xsd:integer”><xsd:minInclusive value=“10”/><xsd:maxInclusive value=“20”/>
</xsd:restriction><xsd:simpleType>
• Besides minInclusive and maxInclusive, as can also use minExclusive and maxExclusive, minLength and maxLength (for string). These are called usually “facets” in XSD. Also there are the facets precision and scale that allow you to control how many floating point digits will be allowed in floating point numbers.
• restriction is used to restrict the range of values.