1 xml major sources: cis550/slides/xml. ppt cis550 course notes, u. penn, source for many slides...

87
1 XML Major Sources: •http://www.cis.upenn.edu/~cis550/ slides/xml.ppt CIS550 Course Notes, U. Penn, source for many slides •Yaron Kanza’s slides, source for many slides •Brian Travis, XML Day At Microsoft Tech·Ed 99 •XML Black Book •Other sources ….

Post on 21-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

1

XML

Major Sources:•http://www.cis.upenn.edu/~cis550/slides/xml.ppt CIS550 Course Notes, U. Penn, source for many slides

•Yaron Kanza’s slides, source for many

slides •Brian Travis, XML Day At Microsoft Tech·Ed 99•XML Black Book•Other sources ….

2

Part I: Background

What’s the difference between the world of documents and information retrieval and databases and query interfaces?

3

Documents vs DatabasesDocument world

> plenty of small documents

> usually static

> implicit structuresection, paragraph, toc,

> tagging

> human friendly

> contentform/layout, annotation

> Paradigms“Save as”, wysiwyg

> meta-dataauthor name, date, subject

Database world> a few large databases> usually dynamic

> explicit structure (schema)

> records

> machine friendly

> contentschema, data, methods

> ParadigmsAtomicity, Concurrency, Isolation, Durability

> meta-dataschema description

4

What to do with themDocuments

• editing

• printing

• spell-checking• counting words

• retrieving (IR)

• searching

Database

• updating

• cleaning

• querying

• composing/transforming

5

HTML• Lingua franca for publishing hypertext on the

World Wide Web• Designed to describe how a Web browser should

arrange text, images and push-buttons on a page.• Easy to learn, but does not convey structure.• Fixed tag set.

<HTML><HEAD><TITLE>Welcome to the XML course</TITLE></HEAD><BODY>

<H1>Introduction</H1><IMG SRC=”dragon.jpeg" WIDTH="200" HEIGHT="150” >

</BODY></HTML>

Opening tag Text (PCDATA)

Closing tag “Bachelor” tagAttribute name Attribute

value

6

Thin red line• The line between the document world and the

database world is not clear.• In some cases, both approaches are

legitimate.• An interesting middle ground is data formats --

of which XML is an example

7

The Structure of XML• XML consists of tags and text

• Tags come in pairs <date> ...</date>

• They must be properly nested <date> <day> ... </day> ... </date> --- good <date> <day> ... </date>... </day> --- bad

(You can’t do <i> ... <b> ... </i> ...</b> in HTML)

8

XML textXML has only one “basic” type -- text.

It is bounded by tags e.g. <title> The Big Sleep </title> <year> 1935 </ year> --- 1935 is still text

XML text is called PCDATA (for parsedcharacter data). It uses a 16-bit encoding,e.g. \&\#x0152 for the Hebrew letter Mem

Later we shall see how new types are specified by XML-data

9

XML structureNesting tags can be used to express various

structures. E.g. A tuple (record) :

<person><name> Jeff Cohen</name><tel> 04-828-1345 </tel><tel> 054-470-778 </tel><email> [email protected] </email> </person>

10

XML structure (cont.)• We can represent a list by using the same tag repeatedly:

<addresses> <person> ... </person> <person> ... </person> <person> ... </person> ...</addresses>

11

XML structure (cont.)• We can represent a list by using the same tag repeatedly:<addresses>

<person><name> Yossi Orr</name><tel> 04-828-1345 </tel><email> [email protected] </email>

</person><person>

<name> Irma Levy</name><tel> 03-426-1142 </tel><email>[email protected]</email>

</person></addresses>

12

Terminology

The segment of an XML document between an opening and a corresponding closing tag is called an element.

<person> <name> Malcolm Atchison </name>

<tel> (215) 898 4321 </tel> <tel> (215) 898 4321 </tel>

<email> [email protected] </email> </person>

element

not an elementelement, a sub-elementof

13

XML is tree-like

person

name emailtel tel

Malcolm Atchison

(215) 898 4321

(215) 898 4321

[email protected]

Semistructured data models typically put the labels on the edges

14

Mixed Content

An element may contain a mixture of sub-elements and PCDATA

<airline> <name> British Airways </name> <motto> World’s <dubious> favorite</dubious> airline </motto>

</airline>

Data of this form is not typically generated from databases. It is needed for consistency with HTML

15

A Complete XML Document<?XMLversion ="1.0" encoding="UTF-8"

standalone="no"?>

<!DOCTYPE addresses SYSTEM "http://www.cs.technion.ac.il/~oshmu/addresses1.dtd">

<addresses>

<person>

<name> Jeff Cohen</name>

<tel> 04-828-1345 </tel>

<tel> 054-470-778 </tel>

<email> [email protected] </email>

</person>

</addresses>

16

The Header Tag

• <?xml version="1.0" standalone="yes/no" encoding="UTF-8"?>

• You can leave out the encoding attribute and the processor will use the UTF-8 default.

17

Two ways of representing a DB projects:

title budget managedBy

employees:

name ssn age

18

Project and Employee relations in XML

<db> <project> <title> Pattern recognition </title> <budget> 10000 </budget> <managedBy> Joe </managedBy> </project> <employee> <name> Joe </name> <ssn> 344556 </ssn> <age> 34 < /age> </employee>

<employee> <name> Sandra </name> <ssn> 2234 </ssn> <age> 35 </age> </employee> <project> <title> Auto guided vehicle </title> <budget> 70000 </budget> <managedBy> Sandra </managedBy> </project> :</db>

Projects and employees are intermixed

19

<db><projects>

<project> <title> Pattern recognition

</title> <budget> 10000 </budget> <managedBy> Joe

</managedBy></project>

<project> <title> Auto guided vehicles

</title> <budget> 70000 </budget> <managedBy> Sandra

</managedBy> </project> : </projects>

Project and Employee relations in XML (cont’d)

<employees><employee>

<name> Joe </name> <ssn> 344556 </ssn> <age> 34 </age> </employee> <employee> <name> Sandra

</name> <ssn> 2234 </ssn>

<age>35 </age> </employee> : <employees></db>

Employees follow projects

20

<db> <projects> <title> Pattern recognition

</title> <budget> 10000 </budget> <managedBy> Joe

</managedBy> <title> Auto guided vehicles

</title> <budget> 70000 </budget> <managedBy> Sandra

</managedBy> : </projects>

Project and Employee relations in XML (cont’d)

<employees> <name> Joe </name> <ssn> 344556 </ssn> <age> 34 </age> <name> Sandra </name> <ssn> 2234 </ssn> <age> 35 </age> : </employees></db>

Or without “separator” tags …

21

AttributesAn (opening) tag may contain attributes. These are typically used to describe the content of an element

<entry> <word language = “en”> cheese </word> <word language = “fr”> fromage </word> <word language = “ro”> branza </word> <meaning> A food made … </meaning>

</entry>

22

Attributes (cont’d)Another common use for attributes is to express dimension or type

<picture> <height dim= “cm”> 2400 </height> <width dim= “in”> 96 </width> <data encoding = “gif” compression = “zip”> M05-.+C$@02!G96YE<FEC ... </data></picture>

A document that obeys the “nested tags” rule and does not repeat an attribute within a tag is said to be well-formed .

23

Attributes (cont’d)<addresses >

<person friend="yes">

<name> Jeff Cohen</name>

<tel> 04-828-1345 </tel>

<tel> 054-470-778 </tel>

<email> [email protected] </email>

</person>

<person friend="no">

<name> Irma Levy</name>

<tel> 03-426-1142 </tel>

<email>[email protected]</email>

</person>

</addresses>

24

When to use attributesIt’s not always clear when to use attributes

<person ssno= “123 45 6789”> <name> F. MacNiel </name> <email> [email protected] </email> ...</person>

<person> <ssno> 123 45 6789 </ssno> <name> F. MacNiel </name> <email> [email protected] </email> ...</person>

25

Using IDs<person id="jeff" friend="yes" knows="irma">

<name> Jeff Cohen</name>

<tel> 04-828-1345 </tel>

<tel> 054-470-778 </tel>

<email> [email protected] </email>

</person>

<person id="irma" friend="no" knows="jeff">

<name> Irma Levy</name>

<tel> 03-426-1142 </tel>

<email>[email protected]</email>

</person>

26

ODL schema

class Movie

( extent Movies, key title )

{

attribute string title;

attribute string director;

relationship set<Actor> casts

inverse Actor::acted_In;

attribute int budget;

} ;

class Actor

( extent Actors, key name )

{

attribute string name;

relationship set<Movie> acted_In

inverse Movie::casts;

attribute int age;

attribute set<string> directed;

} ;

27

An example<db> <movie id=“m1”> <title>Waking Ned

Divine</title> <director>Kirk Jones

III</director> <cast idrefs=“a1 a3”></cast> <budget>100,000</budget>

</movie> <movie id=“m2”> <title>Dragonheart</title> <director>Rob

Cohen</director> <cast idrefs=“a2 a9

a21”></cast> <budget>110,000</budget>

</movie> <movie id=“m3”> <title>Moondance</title> <director>Dagmar

Hirtz</director> <cast idrefs=“a1 a8”></cast> <budget>90,000</budget> </movie> :

<actor id=“a1”> <name>David Kelly</name> <acted_In idrefs=“m1 m3 m78” > </acted_In> </actor> <actor id=“a2”> <name>Sean Connery</name> <acted_In idrefs=“m2 m9 m11”> </acted_In> <age>68</age> </actor> <actor id=“a3”> <name>Ian Bannen</name> <acted_In idrefs=“m1 m35”> </acted_In> </actor> :</db>

28

Part II: Document Type Descriptors

Imposing structure on XML documents

29

<?xml version="1.0" encoding="UTF-8"?>

<!ELEMENT email (#PCDATA)>

<!ELEMENT tel (#PCDATA)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT person (name,tel*,email)>

<!ATTLIST person friend (yes | no) #IMPLIED

id ID #REQUIRED

knows IDREFS #IMPLIED>

<!ELEMENT addresses (person)*>

30

In XMLSpy Grid View

31

Document Type Descriptors

• Document Type Descriptors (DTDs) impose structure on an XML document.

• There is some relationship between a DTD and a schema, but it is not close -- hence the need for additional “typing” systems.

• The DTD is a syntactic specification.

32

Example: The Address Book<person>

<name> MacNiel, John </name>

<greet> Dr. John MacNiel </greet>

<addr>1234 Huron Street </addr>

<addr> Rome, OH 98765 </addr>

<tel> (321) 786 2543 </tel>

<fax> (321) 786 2543 </fax>

<tel> (321) 786 2543 </tel>

<email> [email protected] </email>

</person>

Exactly one name

At most one greeting

As many address lines as needed (in order)

Mixed telephones and faxes

As manyas needed

33

Specifying the structure

• name to specify a nameelement

• greet? to specify an optional (0 or 1) greet elements

• name,greet? to specify a name followed by an optional greet

34

Specifying the structure (cont)

• addr* to specify 0 or more address lines

• tel | fax a tel or a fax element

• (tel | fax)* 0 or more repeats of tel or fax

• email* 0 or more email elements

35

Specifying the structure (cont)

So the whole structure of a person entry is specified by

name, greet?, addr*, (tel | fax)*, email*

This is known as a regular expression. Why is it important?

36

Regular Expressions

Each regular expression determines a corresponding finite state automaton. Let’s start with a simpler example:

name, addr*, email

This suggests a simple parsing program

name

addr

email

37

Another example

name,address*,(tel | fax)*,email*

name

address

tel

tel

fax

fax

email

email

Adding in the optional greet furthercomplicates things

email

38

Internal DTD for the address book<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE addressbook [ <!ELEMENT addressbook (project*)> <!ELEMENT project (name, greet?, address*, (fax | tel)*, email*)> <!ELEMENT name (#PCDATA)> <!ELEMENT greet (#PCDATA)> <!ELEMENT address(#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT fax (#PCDATA)> <!ELEMENT email (#PCDATA)>]>

39

Rest of the address book

<addressbook> <project> <name> Jeff Cohen </name> <greet> Dr. Cohen </greet>

<email> [email protected] </email> </project></addressbook>

40

Our relational DB revisited projects:

title budget managedBy

employees:

name ssn age

41

Two DTDs for the relational DB

<!DOCTYPE db [

<!ELEMENT db (projects,employees)><!ELEMENT projects (project*)><!ELEMENT employees (employee*)>

<!ELEMENT project (title, budget, managedBy)><!ELEMENT employee (name, ssn, age)>...

]>

<!DOCTYPE db [<!ELEMENT db (project | employee)*><!ELEMENT project (title, budget,

managedBy)><!ELEMENT employee (name, ssn, age)>...

]>

42

Recursive DTDs

<DOCTYPE genealogy [<!ELEMENT genealogy (person*)><!ELEMENT person (

name,dateOfBirth,person, -- motherperson )> -- father

... ]>

What is the problem with this?XMLSpy does not notice it!

43

Recursive DTDs cont’d.

<DOCTYPE genealogy [<!ELEMENT genealogy (person*)><!ELEMENT person (

name,dateOfBirth,person?, --

motherperson? )> --

father ...

]>

What is now the problem with this?

44

Some things are hard to specify

Each employee element is to contain name, age and ssn elements in some order.

<!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) |

(ssn, name, age) | ... )>

Suppose there were many more fields !

45

General Definitions of Entities

ANY - tells that the element can have anycontent.

EMPTY - tells that the element have nocontent.

46

Summary of XML regular expressions

• A The tag A occurs• e1,e2 The expression e1 followed by e2• e* 0 or more occurrences of e• e? Optional -- 0 or 1 occurrences• e+ 1 or more occurrences• e1 | e2 either e1 or e2• (e) grouping

47

Deterministic Requirement• Content models in element type declarations

should be deterministic.• Formally, the Glushkov automaton is

deterministic.• This automaton has states the positions of the

regular expression (semantic actions).• The transitions are based on the ‘follows set’.• The associated automata are succinct.• A regular language may not have an associated

deterministic grammar, e.g., <!ELEMENT ndeter ((movie|director)*,movie,(movie|

director))>

48

Specifying attributes in the DTD

<!ELEMENT height (#PCDATA)><!ATTLIST height dimension CDATA #REQUIRED accuracy CDATA #IMPLIED >

The dimension attribute is required; the accuracy attribute is optional.

CDATA is the “type” of the attribute -- it means string, may take any literal string as a value.

49

Specifying ID and IDREF attributes

<!DOCTYPE family [ <!ELEMENT family (person)*> <!ELEMENT person (name)> <!ELEMENT name (#PCDATA)> <!ATTLIST person

id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED>]>

50

Some conforming data

<family> <person id="jane" mother="mary" father="john"> <name> Jane Doe </name> </person> <person id="john" children="jane jack"> <name> John Doe </name> </person> <person id="mary" children="jane jack"> <name> Mary Doe </name> </person> <person id="jack" mother=”mary" father="john"> <name> Jack Doe </name> </person></family>

51

Consistency of ID and IDREF attribute values

•If an attribute is declared as ID– the associated values must all be distinct (no

confusion)

•If an attribute is declared as IDREF– the associated value must exist as the value

of some ID attribute (no dangling “pointers”)

•Similarly for all the values of an IDREFS attribute

•ID and IDREF attributes are not typed

52

Formally• Validity constraint: One ID per Element

TypeNo element type may have more than one ID attribute

specified.

• Validity constraint: ID Attribute DefaultAn ID attribute must have a declared default of

#IMPLIED or #REQUIRED.

• Validity constraint: IDREFValues of type IDREF must match the Name production,

and values of type IDREFS must match Names; each Name must match the value of an ID attribute on some element in the XML document; i.e. IDREF values must match the value of some ID attribute.

53

A useful abbreviation

When an element has empty content we can use

<tag blahblahbla/> for <tag blahblahbla></tag>

For example:<family>

<person id = "jane”>

<name> Jane Doe </name>

<mother idref = "mary”/>

<father idref = "john”/></person>

...

</family>

54

An alternative specification <?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE family [

<!ELEMENT family (person)*>

<!ELEMENT person (name, mother?, father?, children?)>

<!ATTLIST person id ID #REQUIRED>

<!ELEMENT name (#PCDATA)>

<!ELEMENT mother EMPTY>

<!ATTLIST mother idref IDREF #REQUIRED>

<!ELEMENT father EMPTY>

<!ATTLIST father idref IDREF #REQUIRED>

<!ELEMENT children EMPTY>

<!ATTLIST children idrefs IDREFS #REQUIRED>

]>

55

The revised data<family>

<person id="jane">

<name> Jane Doe </name>

<children idrefs="ami tami"/>

</person>

<person id="john">

<name> John Doe </name>

<children idrefs="ami tami"/>

</person>

<person id="ami">

<name> Ami Doe </name>

<mother idref="jane"/>

<father idref="john"/>

</person>

<person id="tami">

<name> Tami Doe </name>

</person>

</family>

56

ODL schema

class Movie

( extent Movies, key title )

{

attribute string title;

attribute string director;

relationship set<Actor> cast

inverse Actor::acted_In;

attribute int budget;

} ;

class Actor

( extent Actors, key name )

{

attribute string name;

relationship set<Movie> acted_In

inverse Movie::cast;

attribute int age;

attribute set<string> directed;

} ;

57

Schema.dtd

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE db [ <!ELEMENT db (movie+, actor+)> <!ELEMENT movie

(title,director,cast,budget)> <!ATTLIST movie id ID #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT director (#PCDATA)> <!ELEMENT cast EMPTY> <!ATTLIST cast idrefs IDREFS #REQUIRED> <!ELEMENT budget (#PCDATA)>

58

Schema.dtd (cont’d)

<!ELEMENT actor (name, acted_In,age?, directed*)>

<!ATTLIST actor id ID #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT acted_In EMPTY> <!ATTLIST acted_In idrefs IDREFS #REQUIRED> <!ELEMENT age (#PCDATA)> <!ELEMENT directed (#PCDATA)>]>

59

Data <db>

<movie id="ohgod">

<title> Oh God!</title>

<director> Woody Allen </director>

<cast idrefs="burns"></cast>

<budget> $2M </budget>

</movie>

<actor id="burns">

<name> George Burns </name>

<acted_In idrefs="ohgod" />

</actor>

</db>

60

Constraints on IDs and IDREFs

• ID stands for identifier. No two ID attributes may have the same value (of type CDATA)

• IDREF stands for identifier reference. Every value associated with an IDREF attribute must exist as an ID attribute value

• IDREFS specifies several (0 or more) identifiers

61

Connecting the document with its DTD

In line:<?xml version="1.0"?>

<!DOCTYPE db [<!ELEMENT ...> … ]><db> ... </db>

Another file:

<!DOCTYPE db SYSTEM "schema.dtd">

A URL: <!DOCTYPE db SYSTEM

"http://www.schemaauthority.com/schema.dtd">

62

Connecting the document with its DTDBoth:

file c:/schema.dtd: <!ELEMENT db (movie+, actor+)> file to be validated<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE db SYSTEM "c:/schema.dtd"[ <!ELEMENT movie (title,director,cast,budget)> <!ATTLIST movie id ID #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT director (#PCDATA)> <!ELEMENT cast EMPTY> <!ATTLIST cast idrefs IDREFS #REQUIRED> <!ELEMENT budget (#PCDATA)> <!ELEMENT actor (name, acted_In,age?, directed*)> <!ATTLIST actor id ID #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT acted_In EMPTY> <!ATTLIST acted_In idrefs IDREFS #REQUIRED> <!ELEMENT age (#PCDATA)> <!ELEMENT directed (#PCDATA)>]><db> <movie id="ohgod"> <title> Oh God!</title> <director> Woody Allen </director> <cast idrefs="burns"></cast> <budget> $2M </budget> </movie> <actor id="burns"> <name> George Burns </name> <acted_In idrefs="ohgod" /> </actor></db>

63

Well-formed and Valid Documents

• Well-formed applies to any document (with or without a DTD): proper nesting of tags and unique attributes

• Valid specifies that the document conforms to the DTD: conforms to regular expression grammar, types of attributes correct, and constraints on references satisfied

64

DTDs v.s Schemas (or Types)• By database (or programming language)

standards DTDs are rather weak specifications. – Only one base type -- PCDATA– No useful “abstractions” e.g., sets– IDREFs are untyped. You point to something, but

you don’t know what!– No constraints e.g., child is inverse of parent– No methods– Tag definitions are global

• Some of the XML extensions impose something like a schema or type on an XML document. We may see these later

65

Part III: Entities

To take storage into account

66

What are Entities

An entity is a shortcut to a set of information

You might think of an entity as being a bit like a

macro.

Entities allow dividing a document between

some different storage devices.

67

Why to use entities:

• Entities save typing.

• Entities can reduce errors.

• Entities are easy to update.

• Entities can act as placeholders for TBD information.

68

Defining Entities• You can define entities in your local document

as part of the DOCTYPE definition.

• You can also link to external files that contain the entity data. This, too, is done through the DOCTYPE definition.

• A third option is to define the entities in your external DTD.

• Use a local definition when the entity is being used only in this one particulars file.

• Use a linked, external file when the entity being used in many document sets.

69

Kinds of Entities

There are two kinds of entities:

• general entities• parameter entities

• Internal• External

• Parsed• Unparsed

• Possibilities (first 4 are parsed):1. Internal Parameter2. External Parameter 3. Internal General4. External General5. External General Unparsed

70

General entities

The definition of general entities in the DTD<!ENTITY Name EntityDefinition >

The usage of the entity in the document is by &Name;

71

Example<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE mdb [

<!ENTITY bm "bad movie">

<!ELEMENT mdb (movie+)>

<!ELEMENT movie (title,director,cast?,budget)>]>

<mdb>

<movie id="ohgod" opinion="&bm;">

<title> Oh God!</title>

<director> Woody Allen </director>

<budget> $2M </budget>

</movie>

</mdb>

72

Browser View

73

Non-parsed Entities<!DOCTYPE mdb [

<!NOTATION gif SYSTEM "c:\Program Files\Netscape\Communicator\Program\Netscape.exe">

<!ENTITY starpicture SYSTEM "http://www.cs.technion.ac.il/~oshmu/star.gif" NDATA gif>

<!ENTITY bm "bad movie">

<!ELEMENT mdb (movie+)>

<!ELEMENT movie (title,director, budget)>

<!ATTLIST movie id ID #REQUIRED

opinion CDATA #IMPLIED

starimage ENTITY #IMPLIED>

<!ELEMENT title (#PCDATA)>

<!ELEMENT director (#PCDATA)>

<!ELEMENT budget (#PCDATA)>

]>

74

Data<mdb>

<movie id="ohgod" opinion="&bm;" starimage="starpicture">

<title> Oh God!</title>

<director> Woody Allen </director>

<budget> $2M </budget>

</movie>

</mdb>

75

Parameter EntitiesParameter entities are used only within DTDs.They carry information for use in the markupdeclaration.

• Internal entities - references are within the DTD.

• External entities - references draw information

from outside files.

Parameter Entity declaration:<!ENTITY % Name EntittyDefinition >

Can’t use in internal DTD subset

76

Parameter Entity Example<?xml version="1.0" encoding="UTF-8"?>

<!ENTITY % essential "name, tel*">

<!ELEMENT email (#PCDATA)>

<!ELEMENT tel (#PCDATA)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT person (%essential;, email, advisor?)>

<!ATTLIST person friend (yes | no) #IMPLIED

id ID #REQUIRED

knows IDREFS #IMPLIED>

<!ELEMENT advisor (person)>

<!ELEMENT addresses (person)*>

77

Entities DefinitionLocal Definition:

<!DOCTYPE [ <!ENTITY copyright

"Copyright 2000, As The World Spins Corp. All

rights reserved. Please do not copy or use without

authorization. For authorization contact

[email protected]."> ]>

Global Definition:

<!DOCTYPE [ <!ENTITY copyright SYSTEM

"http://www.worldspins.com/legal/copyright.xml">]>

78

Example<?xml version="1.0">

<!DOCTYPE [ <!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights

reserved. Please do not copy or use without authorization. Forauthorization contact [email protected].">

<!ENTITY trademark SYSTEM "http://www.worldspins.com/legal/trademark.xml">]>

79

Example (cont.)<PRESSRELEASE><HEAD>Mini-globe revolutionizes keychain industry</HEAD><LEAD>Today As The World Spins introduces a new approach to keychains. With the new MINI-GLOBE keys can be kept inside achain, called for upon demand, and stored safely. Never more will consumers lose a key or stand at a door flipping through a stack of keys seeking the right one.</LEAD><LEGAL>&trademark;&copyright;</LEGAL></PRESSRELEASE>

80

Using CDATA<

HEAD1> Entering a Kennel Club Member</HEAD1><DESCRIPTION>Enter the member by the name on his or her papers. Use the NAME tag. The NAME tag has two attributes. Common (all in lowercase, please!) is the dog's call name. Breed (also in all lowercase) is the dog's breed. Please see the breed reference guide for acceptable breeds. Your entry should look something like this: </DESCRIPTION> <EXAMPLE><![CDATA[<NAME common="freddy" breed"=springer-spaniel">Sir Fredrick of Ledyard's End</NAME>]]></EXAMPLE>

81

82

Namespaces• Namespaces are a way of preventing name

clashes among elements from more than one source within the same XML document.

• They are also useful in identifying elements that are meaningful for a particular XML application.

• See http://www.w3.org/TR/REC-xml-names/

83

Namespaces• URIs are either of URLs or URNs.• An XML namespace is, literally, identified by a URI

reference.• The reference need not point to an actual resource!• A URI reference may be associated more than one

prefix.• Prefixes are used in XML documents in forming

element and attribute names (prefix:localname).• Two prefixes that are associated with the same URI

are said to be in the same namespace.• declaring a namespace - identifying a namespace

used in the document. • DTDs are unaware of namespaces.

84

ExampleDefining the Namespace ATDB:

<document xmlns:ATDB= 'http://www.cs.huji.ac.il/atdb-schema'>

Using a tag from the ATDB Namespace

<ATDB: myTAG>This is an xml tag.</myTAG>

ADTB:myTag is a qualified name.

Using A tag not from the namespace:

<myTAG>This is a ‘made in Israel’ tag.</myTAG>

85

Scope of Namespaces• A prefix is associated with the namespace in

the element scope in which it is defined.• Example (birthdate is associated with no

namespace):<yp:person

xlmns:yp="http://www.cs.technion.ac.il"><yp:name> John Smith</yp:name><birthdate> 12-11-87</birthdate><address xlmns:yp="http://www.ee.technion.ac.il"> Technion City 234</address>

</yp:person>

86

Default Namespaces• A default namespace applies to all elements in its

scope.• However, it does not override explicit prefixes (their

non-prefixed child elements are default-bound).• Example (name and birthdate are bound):

<person xlmns="http://www.cs.technion.ac.il"><name > John Smith</yp:name><birthdate> 12-11-87</birthdate><yp:address type="local" xlmns:yp="http://www.ee.technion.ac.il"> Technion City 234</yp:address>

</person>

• Non-prefixed attribute names are associated with no namespace even when in scope.

87

Summary• XML is a new data format. Its main virtues are

widespread acceptance and the (important) ability to handle semi structured data (data without schema)

• DTDs provide some useful syntactic constraints on documents. As schemas they are weak

• How to store large XML documents?• How to query them?• How to map between XML and other

representations?