xml validation i dtds

Post on 18-Jan-2016

55 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

XML Validation I DTDs. Robin Burke ECT 360 Winter 2004. Outline. History Grammars / Regular expressions DTDs elements attributes entities Declarations. Validation. Why bother?. The idea. Language consists of terminals a, b, c Set of productions beginning with non-terminals - PowerPoint PPT Presentation

TRANSCRIPT

XML Validation IDTDs

Robin Burke

ECT 360

Winter 2004

Outline

History Grammars / Regular expressions DTDs

elementsattributesentities

Declarations

Validation

Why bother?

The idea

Language consists of terminalsa, b, c

Set of productionsbeginning with non-terminals

• A, B, C

rules specifying how to generate sequences of terminals

Example

A aB A aBA B b generates strings

ababab etc.

Grammar

Can be used to efficiently parse a languagebasis of all modern programming

language parsing since Algol-60Java Language Specification is

completely in EBNF grammar

Grammar

XMLgrammar-based syntaxadheres to EBNF

SGMLSGML had a more complex language

definition syntaxHTML is defined the SGML way

Regular expressions

Language for expressing patterns Basic components

pattern elements optional element = ? repetition (1 or more) = + repetition (0 or more) = * choice = | grouping = ( ) sequence = ,

Examples

(a, b)*all strings "ab" "abab" etc.

(a | b | c)+, q, (b, c)*aaqbbqbqcccccccc

Note

Regular expressions are different in different applicationsPerlJavascriptXML Schemas

DTDs only support?+*|,()

EBNF

EBNF is more compact version of BNF it uses regular expressions to simplify

grammar expression A aB A aBA turns into

A aB(A)? only one production per non-terminal

allowed

DTDs

Use EBNF to specify structure of XML documents

Plusattributesentities

Syntaxholdover from SGMLUgly

DTD Syntax

<!ELEMENT element-name content_model>

Content model contains the RHS of the production rule

Example<!ELEMENT name

(firstName, lastName)>

DTD Syntax cont'd

Not XML<! begins a declarationNo "content"Empty elements not indicated with />

Simple content models

Content can be any text#PCDATA

Content can be anything at all (useful for debugging)ANY

Element has no contentEMPTY

Example

<grades><grade>

<student>Jane Doe</student><assigned-grade>A</assigned-grade>

</grade><grade>

<student>John Doe</student><assigned-grade>A-</assigned-grade>

</grade></grades>

Example

<grades><grade>

<student>Jane Doe</student><assigned-grade>A</assigned-grade>

</grade><grade>

<student>John Doe</student><assigned-grade>A-</assigned-grade>

</grade><grade> <student>Wayne Doe</student>

<assigned-grade>I</assigned-grade><reason>Alien abduction</reason>

</grade></grades>

DTD?

Mixed content

Legal to have a content model with text and element data<story category="national" byline="Karen

Wheatley"><headline>President Meets with

Congress</headline>The President meet with Congressional leaders

today in effort to jump-start faltering budget negotiations.

Sources described the mood of the meeting as "cordial". <full_text ref="news801" /> <image src="img2071.jpg" /> <image src="img2072.jpg" /> <image src="img2073.jpg" /></story>

Mixed content, cont'd

<!ELEMENT story (headline, #PCDATA, full-story, image*)>

Mixed content makes handling XML complexnecessary for many applications

Recursion

Unlike grammarsrecursive formulation ≠ repetition

Difference between<!ELEMENT students (student+)><!ELEMENT students (student,

students?)>

Restriction

The grammar cannot be ambiguousA (a, b)| (a, c)this makes the parser implementation

difficult Usually easy to make non-ambiguous

A a, (b | c)

Attribute lists

Declared separately from elementscan be anywhere in the DTD

Specification includesname of the elementname of the attributeattribute typedefault

Attribute types

Character data CDATA different from XML CDATA section!

Enumerated (yes|no)

ID must be unique in the document

IDREF must refer to an id in the document

NMTOKEN a restriction of CDATA to single "word"

Also IDREFS and NMTOKENS

Default declaration

#REQUIRED #IMPLIED

means optional Value

this becomes the default #FIXED

value provided

Examples

<!ATTLIST img

src CDATA #REQUIRED

alt CDATA #REQUIRED

align (left|right|center) "left"

id ID #IMPLIED

>

<!ATTLIST timestamp

time-zone NMTOKEN #IMPLIED>

Entities

Like macroscontent to be insertedindicated with &name;

Predefined general entities&amp; &lt;essential part of XML

User-defined general entities&disclaimer;

Entities, cont'd

Parameter entitiescan also be used to simplify DTD

creationor to combine DTDsindicated with a %

More on this next week

Defining general entities

<!ENTITY name content> Example

<!ENTITY disclaimer

"This is a work of fiction. Any resemblance to persons living or dead is unintentional.">

In-class exercise

Business cards

Next week

More DTDsEntitiesModularization and parameterizationpg. 129-148

Lab

top related