xml schema: an intensive one-day tutorial

66
XML Schema: An Intensive One-Day Tutorial Henry S. Thompson HCRC Language Technology Group University of Edinburgh

Upload: iman

Post on 27-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

XML Schema: An Intensive One-Day Tutorial. Henry S. Thompson HCRC Language Technology Group University of Edinburgh. When you see this, it means there’s accompanying information in the Additional Materials handbook. 2. Overview. What are schemata, anyway? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: XML Schema: An Intensive One-Day Tutorial

XML Schema:An Intensive One-Day Tutorial

Henry S. ThompsonHCRC Language Technology

GroupUniversity of Edinburgh

Page 2: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

2

Overview What are schemata, anyway?

The nature of document structure Schema as contract Taking control of structure definition

XML Schema: the activity The W3C and its WGs The Charter and Requirements The state of play

The Draft RECs A detailed walkthrough

Schemas and Layered Architecture

2When you see this, it means there’s accompanying information in the Additional Materials handbook

Page 3: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

3

Terminology Documents have structure

Document types Document instances

Structure can be defined Informally (D. S. D.) SGML DTD XML DTD Schema using XML

Page 4: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

4

Background SGML DTDs for D. S. D

Sperberg-McQueen Others

Considered for XML itself MCF, then RDF, now DCD, by Bray et al. XML-Data, two versions, now XML-Data

reduced, by Layman et al., then Frankston and Thompson

SOX, from Veo Corp. XSchema, from an ad-hoc group of

designers

Page 5: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

5

Document Structure Two relations are constitutive

Part-of Kind-of

Existing DSD mechanisms use Content Models to specify part-of relations

But they only specify kind-of relations implicitly or informally

Making kind-of relations explicit would make both understanding and maintenance easier

Page 6: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

6

Taking Control of D. S. D. Eric Naggum used to talk about

SGML allowing users to take control of their data

XML allows the same move one level up, for developers The starting point is much simpler The architecture is congenial The demand is there

We need to do this, to make the transition to validation easier

Page 7: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

7

Why validate? A D. S. D. is a contract between

producers and consumers It provides a guaranteed interface Producers validate to ensure they are

providing what they promised Consumers validate to check up on

producers and to protect their applications

Application authors validate to simplify their task Leave error detection and analysis to the

validating parser

Page 8: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

8

Reconstructing DTDs The Schema DTD is expressed in vanilla

XML Top level elements for declaring

Elements :-) Types Notations . . .

Subordinate element types for declaring Attributes Content models . . .

Page 9: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

9An aside about terminology SGML and XML 1.0 talk about element

types XML Schema to date has been more

casual and just talked about elements Meaning either an element in an instance Or the abstraction which is described in a

DTD or Schema Further confused by XML Schema

making extensive use of type Also, schema means many different

things to different people I'll try always to say/write XML Schema. . .

Page 10: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

10

A simple example<!ELEMENT text (#PCDATA|emph|name)*><!ATTLIST text timestamp NMTOKEN #REQUIRED>

<element name="text"> <type content="mixed"> <element ref="emph"/> <element ref="name"/> <attribute name="timestamp" type="date" minOccurs="1"/> </type></element>

Page 11: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

11The Schema Architecture: Static A document or an application or a

user identifies a schema Each is well-formed XML The schema is valid w.r.t the

Schema DTD The document is schema-valid w.r.t

the schema The schema is schema-valid wrt the

schema for schemas

Page 12: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

12The Schema Architecture: Dynamic An XML application (XSP) which

schema-validates ‘Takes control’ because changing

how schemata work means changing the Schema DTD/schema for

schemas upgrading XSP accordingly not changing XML itself

Page 13: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

13

The W3C XML Schema hopes to be a W3C

Recommendation The W3C is The World Wide Web

Consortium, a voluntary association of companies and non-profit organisations. Membership costs serious money, confers voting rights. Complex procedures, with the Chairman (Tim Berners-Lee) holding all the high cards, but the big vendors (e.g. Microsoft, Adobe, Netscape) have a lot of power.

Page 14: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

14

. . . and its WGs The XML recommendation was

written by the W3C’s XML Working Group

Which split itself into pieces, of which one is the XML Schema WG

Chartered in the autumn of 1998 Requirements document out in

February of 1999 Due to go to Last Call early in 2000

Page 15: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

15

Requirements document Full of good and hopeful

requirements DTDs and more Support inheritance Data-friendly Good inventory of primitive

datatypes

5

Page 16: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

16

The state of play Two component documents

Structures Datatypes

Three public working drafts so far May 1999 September 1999 November 1999:

Further (near-final) PWD out December 1999

http://www.w3.org/TR/xmlschema-1/

[contains pointers to previous drafts]

6

8

Page 17: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

17The XML Schema worldview Validity and well-formedness are XML 1.0

concepts They are defined over character sequences

Namespace-compliant is a Namespace concept It's defined over character sequences too

Schema-validity is the XML Schema concept It is defined over XML document Infosets

So the whole XML Schema exercise is predicated on and layered on top of XML 1.0 well-formedness plus Namespaces Because they are constitutive of the Infoset

Page 18: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

18

What's the Infoset? The XML 1.0 plus Namespaces

abstract data model Defines a modest number of

information items Element, attribute, namespace

declaration, ... Each has required and optional

properties Name, children, …

Page 19: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

19

What the Infoset isn't It's not the DOM

Much higher level It's not about implementation or

interfacing at all But you can think of it as a data

structure if that helps It's not an SGML property set/grove

But it's close It doesn't have the entity problem

a mixed blessing, as we will see

Page 20: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

20The Schema and the Infoset So crucially, schemas are about

infosets, not character sequences You could schema-validate a DOM

tree you built by hand! Using a schema which exists only as a

DOM tree ditto This simplifies things tremendously

but is hard to get your head around at first

Page 21: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

21Basic XML Schema concepts Syntax is not the Schema Namespaces are fundamental But a schema is not a namespace Separation of tag from type Simple and Complex types Modular Schema construction Powerful type construction Local tag-type association Powerful wildcards Element equivalence classes Extension mechanism Documentation mechanism

Page 22: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

22

Schema Walkthrough 1 A Toy Purchase Order schema 10

Page 23: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

23

Types and Type Derivation For purposes of discussion,

consider only the content type aspects of types (attributes are analogous)

A content type definition (simple or complex) consists of a set of constraints on what's allowed as content.

Page 24: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

24Permissions and obligations You can think of the type itself as the set

of strings/EIIs its constraints allow. It's helpful to think of constraints as composed of obligations and permissions: (\d )?(\d{3}-)?\d{3}-\d{4} regexp definition facet for [US] 'phone

number type the ? and the \d can be seen as

permissions, the - and the {3} as obligations 1 337-6818 and 207-422-6240 belong to this

type

Page 25: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

25

Complex types (title?,forename*,surname) (shorthand for) content model for name

the ? can be seen as permission, the , and the 'surname' as obligations (at the end of the day, each component involves both permission AND obligation, but the balance of impact is as suggested)

Page 26: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

26

Complex types, cont'd (title?,forename*,surname)

<name> <forename>...</forename> <surname>...</surname> </name>

and <name> <title>...</title> <surname>...</surname> </name>

are both members of this type

Page 27: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

27

Restriction A type definition may be a restriction of

another type's definition if it reduces permissions, sometimes to the point of inducing obligations: \d[01]\d-\d{3}-\d{4} (a restriction (\d )?(\d{3}-)?\d{3}-\d{4} of US p#)

The membership of this type, which includes 207-422-6240 but not 1 337-6818

is a (proper) subset of the membership of the original type,

because by construction every member of the new type is a member of the original.

Page 28: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

28

Restriction, cont'd Similarly,

(forename+,surname) is a restriction of the original type

definition for name (title?,forename*,surname)

and the same relation holds.

Page 29: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

29

Restriction, cont'd Note first that

(forename+,surname) <name> <forename>...</forename> <surname>...</surname> </name>

is a member of the new type, but <name> <title>...</title> <surname>...</surname> </name>

is not.

Page 30: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

30

Extension Now consider

(title?, forename*, surname, genMark?)

This type extends the original type definition for name. <name> <forename>Al</forename> <surname>Gore</surname> <genMark>Jr</genMark></name>

is an instance of this new type, but not of the original.

Page 31: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

31

Any Finally note that the <any/>

content model particle, in all of its forms, introduces particularly broad permissions into complex content types.

Page 32: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

32

Where are we headed? A number of design decisions can now

be stated: Should we make it easy to construct

type definitions which restrict or extend other type definitions, by specifying only the method of derivation and the differences between the source and derived type definitions?

The new proposal says 'yes', you do this by using the "source" and "derivedBy" attributes on your <type> or <datatype> element.

Page 33: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

33

Datatype example Consider the simple type case first:

<datatype name='bodytemp' source='decimal'> <precision value='4'/> <scale value='1'/> <minInclusive value='97.0'/> <maxInclusive value='105.0'/> </datatype>

Page 34: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

34

Derived type<datatype name='healthyBodytemp' source='bodytemp'> <maxInclusive value='99.5'/> </datatype>

The healthyBodytemp type definition is defined by closing down the permitted range of bodytemp. We say it 'inherits' the other facets of bodytemp, so the 'effective type definition' of healthyBodytemp is

Page 35: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

35

Effective type <datatype name='healthyBodytemp' source='decimal'> <precision value='4'/> <scale value='1'/> <minInclusive value='97.0'/> <maxInclusive value='99.5'/> </datatype>

Since it doesn't in general make sense to extend one simple type by another, the "derivedBy" attribute is actually redundant for <datatype>.

Page 36: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

36Extension for complex types The next simplest case is extension

for complex types: <type name='name'> <element name='title' minOccurs='0'/> <element name='forename' minOccurs='0' maxOccurs='*'/> <element name='surname'/> </type>

Page 37: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

37

Derived type <type name='fullName' source='name' derivedBy='extension'> <element name='genMark' minOccurs='0'/> </type>

Page 38: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

38

The effective type <type name='fullName'> <element name='title' minOccurs='0'/> <element name='forename' minOccurs='0' maxOccurs='*'/> <element name='surname'/> <element name='genMark' minOccurs='0'/> </type>

Page 39: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

39Restriction for complex types Restriction for complex types is

harder to handle syntactically, because of the significance of linear order in content models, but the semantics are completely parallel to the simple type case:

Page 40: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

40

Restriction example<type name='simpleName' source='name' derivedBy='restriction'> <restrictions> <element name='title' maxOccurs='0'/> <element name='forename' minOccurs='1'/> </restrictions> </type>

Page 41: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

41Restriction and Inheritance Just as in the <datatype> case, the

content model aspects not mentioned are left alone, including the "maxOccurs='*'" on <forename> and the whole particle for <surname>, so the 'effective content model' of 'simpleName' is

Page 42: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

42

Effective type <type name='simpleName'> <element name='title' maxOccurs='0' minOccurs='0'/> <!-- i.e. forbidden --> <element name='forename' minOccurs='1' maxOccurs='*'/> <element name='surname'/> </type>

Page 43: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

43

Instances Given all the example definitions

above, all of <name><title>Ms</title><surname>Steinem</surname></name>

<name xsi:type='simpleName'> <foreName>Harry</foreName> <foreName>S</foreName> <surname>Truman</surname> </name>

Page 44: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

44

Another instance <name xsi:type='fullName'> <forename>Al</forename> <surname>Gore</surname> <genMark>Jr</genMark> </name>

all would be schema-valid per <element name='name' type='name'/>

Page 45: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

45Connecting Instances and Schemas Like I said

A schema is not a namespace The connection cannot be made rigid The draft identifies three layers, first is

schema-valid(EII,TypeName,ComponentSet)

The TypeName is a (namespaceURI,NCName) pair

The component set is made up of (namespaceURI,NCName,component) triples

Page 46: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

46

Other layers Layer 2: transfer syntax Layer 3: web connections

Page 47: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

47

Schema Walkthrough 2 The Schema for Datatypes 13

Page 48: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

48

Schema Walkthrough 3 The Schema for Schemas 21

Page 49: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

49

Change of Gear Let's look at the role of schemas in

supporting the layered architecture which is emerging all around us

Page 50: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

50XML is ASCII for the 21st century ASCII (ISO 646) solved a fundamental

interchange problem for flat text documents What bits encode what characters

– (For a pretty parochial definition of 'character') UNICODE/ISO 10646 extends that

solution to the whole world XML thought it was doing the same for

simple tree-structured documents The emphasis in the XML design was on

simplifying SGML to move it to the Web XML didn't touch SGML's architectural

vision– flexible linearisation/transfer syntax– for tree-structured documents with internal links

Page 51: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

51

Just what is XML? It's a markup language used for

annotating text It is concerned with logical structure

to identify sections, titles, section headers, chapters, paragraphs,…

It is not concerned with appearance you say 'this is a subtitle'

not 'this is in bold, 14pt, centered' you say 'this is an example'

not 'this is in verbatim, indented by 5pts, ragged right'

Page 52: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

52Take Two: Just what is XML? It's a markup language used for

transferring data It is concerned with data models

to convert between application-appropriate and transfer-appropriate forms

It is not concerned with human beings It's produced and consumed by

programs

Page 53: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

53

XML as UI A slogan of Adam Bosworth I interpret it in two ways:

At the client end– Use XML plus XSL as the basis for what the

user sees on his/her screen– Use XLinks from a master document to pull

together disparate sources of information At the server end

– Use XML as a uniform interface for any data source onto the web

– Not just documents, but E.g. Databases, process control information, stock quotes

Page 54: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

54

Application data

Page 55: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

55

Structured markup<POORDERHDR><DATETIME qualifier="DOCUMENT"> <YEAR>1996</YEAR> <MONTH>06</MONTH> <DAY>30</DAY> <HOUR>23</HOUR> <MINUTE>59</MINUTE> <SECOND>59</SECOND> <SUBSECOND>0000</SUBSECOND> <TIMEZONE>+0100</TIMEZONE> </DATETIME> <OPERAMT qualifier="EXTENDED" type="T"> <VALUE>670000</VALUE> <NUMOFDEC>2</NUMOFDEC> <SIGN>+</SIGN> <CURRENCY>USD</CURRENCY>. . .

Page 56: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

56

What just happened!? The whole transfer syntax story just

went meta, that's what happened! XML has been a runaway success, on a

much greater scale than its designers anticipated Not for the reason they had hoped

– Because separation of form from content is right But for a reason they barely thought about

– Data must travel the web Tree structured documents are a useable

transfer syntax for just about anything So data-oriented web users think of XML as

a transfer mechanism for their data

Page 57: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

57The Cambridge Communiqué A W3C Note resulting from a meeting

this August (http://www.w3.org/TR/schema-arch)

Signalled a widespread acceptance of layering:"XML has defined a transfer syntax for tree-structured documents;

"Many data-oriented applications are being defined which build their own data structures on top of an XML document layer, effectively using XML documents as a transfer mechanism for structured data; "

Page 58: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

58

The Communiqué, cont'd Called for support in XML Schema for

specifying mapping between the XML document data model (or XML Infoset) and application-specific data models

XML Schema is a W3C recommendation-in-progress for definiing the structure of document families

A grammar for markup structure E.g.

artice -> title, subtitle?, section+

orPOORDERHDR -> DATETIME, ORDERAMT

Page 59: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

59

Mapping between layers Fortunately, XML Schema is actually

notated in XML itself So there are elements defined for use

in schemas to define. . . Elements :-) Attributes Types

A type is a collection of constraints on element content and attribute values

A type may be either simple, for constraining string values complex, for constraining elements which

contain other elements

Page 60: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

60

Type definition example<type name='personName'> <element name='title' minOccurs='0'/> <element name='forename' minOccurs='0' maxOccurs='*'/> <element name='surname'/> <attribute name='id' type='integer'/></type>

<element name='owner' type='personName'/>

Page 61: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

61

Mapping between layers 2 We can think of this in two ways

In terms of an abstract data modelling language– Entity-Relation– UML– RDF

In concrete implementation terms– Tables and rows– Class instances and instance variables

The first is more portable The second more immediately

useful

Page 62: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

62

Mapping between layers 3 Regardless of what approach we take,

we need A vocabulary of data model components An attachment of that vocabulary to schema

components Sample vocabularies

entity, relationship, collection table, row, column instance, variable, list, dictionary

Where should attachment be specified? In the schema

– convenient Outside it

– modular

Page 63: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

63Specifying mapping in the schema Probably reasonable if done in high-

level (ER, UML) terms See example infoset-xmpl.xml,

infoset-uml.xsd

Page 64: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

64Specifying mapping outside Requires some duplication of

structural information Encourages cross-language working See example infoset-xmpl.xsl

Page 65: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

65

Take-home message The point at which idiosyncratic

scripting takes over can be moved one layer up

Using public consensual declarative standards is a Good Thing

Interoperability makes things better for everyone

Page 66: XML Schema: An Intensive One-Day Tutorial

Reuters Henry S. ThompsonXML Schema, London 1999-12-15

66

Overall Conclusion"Schemas are coming: Start using

them!" ____Tim Berners-Lee, 1999-11-

05