xml validation i dtds
DESCRIPTION
XML Validation I DTDs. Robin Burke ECT 360 Winter 2004. Outline. History Grammars / Regular expressions DTDs elements attributes entities Declarations. Validation. Why bother?. The idea. Language consists of terminals a, b, c Set of productions beginning with non-terminals - PowerPoint PPT PresentationTRANSCRIPT
XML Validation IDTDs
Robin Burke
ECT 360
Winter 2004
Outline
History Grammars / Regular expressions DTDs
elementsattributesentities
Declarations
Validation
Why bother?
The idea
Language consists of terminalsa, b, c
Set of productionsbeginning with non-terminals
• A, B, C
rules specifying how to generate sequences of terminals
Example
A aB A aBA B b generates strings
ababab etc.
Grammar
Can be used to efficiently parse a languagebasis of all modern programming
language parsing since Algol-60Java Language Specification is
completely in EBNF grammar
Grammar
XMLgrammar-based syntaxadheres to EBNF
SGMLSGML had a more complex language
definition syntaxHTML is defined the SGML way
Regular expressions
Language for expressing patterns Basic components
pattern elements optional element = ? repetition (1 or more) = + repetition (0 or more) = * choice = | grouping = ( ) sequence = ,
Examples
(a, b)*all strings "ab" "abab" etc.
(a | b | c)+, q, (b, c)*aaqbbqbqcccccccc
Note
Regular expressions are different in different applicationsPerlJavascriptXML Schemas
DTDs only support?+*|,()
EBNF
EBNF is more compact version of BNF it uses regular expressions to simplify
grammar expression A aB A aBA turns into
A aB(A)? only one production per non-terminal
allowed
DTDs
Use EBNF to specify structure of XML documents
Plusattributesentities
Syntaxholdover from SGMLUgly
DTD Syntax
<!ELEMENT element-name content_model>
Content model contains the RHS of the production rule
Example<!ELEMENT name
(firstName, lastName)>
DTD Syntax cont'd
Not XML<! begins a declarationNo "content"Empty elements not indicated with />
Simple content models
Content can be any text#PCDATA
Content can be anything at all (useful for debugging)ANY
Element has no contentEMPTY
Example
<grades><grade>
<student>Jane Doe</student><assigned-grade>A</assigned-grade>
</grade><grade>
<student>John Doe</student><assigned-grade>A-</assigned-grade>
</grade></grades>
Example
<grades><grade>
<student>Jane Doe</student><assigned-grade>A</assigned-grade>
</grade><grade>
<student>John Doe</student><assigned-grade>A-</assigned-grade>
</grade><grade> <student>Wayne Doe</student>
<assigned-grade>I</assigned-grade><reason>Alien abduction</reason>
</grade></grades>
DTD?
Mixed content
Legal to have a content model with text and element data<story category="national" byline="Karen
Wheatley"><headline>President Meets with
Congress</headline>The President meet with Congressional leaders
today in effort to jump-start faltering budget negotiations.
Sources described the mood of the meeting as "cordial". <full_text ref="news801" /> <image src="img2071.jpg" /> <image src="img2072.jpg" /> <image src="img2073.jpg" /></story>
Mixed content, cont'd
<!ELEMENT story (headline, #PCDATA, full-story, image*)>
Mixed content makes handling XML complexnecessary for many applications
Recursion
Unlike grammarsrecursive formulation ≠ repetition
Difference between<!ELEMENT students (student+)><!ELEMENT students (student,
students?)>
Restriction
The grammar cannot be ambiguousA (a, b)| (a, c)this makes the parser implementation
difficult Usually easy to make non-ambiguous
A a, (b | c)
Attribute lists
Declared separately from elementscan be anywhere in the DTD
Specification includesname of the elementname of the attributeattribute typedefault
Attribute types
Character data CDATA different from XML CDATA section!
Enumerated (yes|no)
ID must be unique in the document
IDREF must refer to an id in the document
NMTOKEN a restriction of CDATA to single "word"
Also IDREFS and NMTOKENS
Default declaration
#REQUIRED #IMPLIED
means optional Value
this becomes the default #FIXED
value provided
Examples
<!ATTLIST img
src CDATA #REQUIRED
alt CDATA #REQUIRED
align (left|right|center) "left"
id ID #IMPLIED
>
<!ATTLIST timestamp
time-zone NMTOKEN #IMPLIED>
Entities
Like macroscontent to be insertedindicated with &name;
Predefined general entities& <essential part of XML
User-defined general entities&disclaimer;
Entities, cont'd
Parameter entitiescan also be used to simplify DTD
creationor to combine DTDsindicated with a %
More on this next week
Defining general entities
<!ENTITY name content> Example
<!ENTITY disclaimer
"This is a work of fiction. Any resemblance to persons living or dead is unintentional.">
In-class exercise
Business cards
Next week
More DTDsEntitiesModularization and parameterizationpg. 129-148
Lab