data modeling using xml schemas murali mani extreme 2002

Data Modeling using XML Schemas

Murali Mani

Extreme 2002

What this talk is not about

Not about

<review> <reviewer>X</reviewer> gave a <rating>two thumbs up</rating> for the <movie>Fugitive, The</movie></review>

We talk about data modeling from database perspective.

What is database perspective?

Our world consists of Entities Relationships

binary - 1:1, 1:many, many:many n-ary recursive

Attributes for entities Attributes for relationships

Outline of the talk

How XML can contribute to the DB community. Introduction of the ER model How ER concepts are modeled using relational

model Mapping ER concepts to XML model Constraint specification for XML – what are the

options? Subtyping for XML processing – do we need it,

and what are the options?

How XML can contribute to DB community

Standard exchange format Superior data model?

Recursive relationships Union types

Person

(name | (lastname, firstname), age, address) Frendlier representation of relationships?

person (person*)person (person?)

Person Age Father

X 25 Y

Y 55 null<person Y, 55> <person X, 25/></person>

Data Modeling

What is a data model? Structural specification Specification of constraints Operations to retrieve/update the data

Stages in database design Conceptual model Logical Model Physical

model Conceptual Model and Logical Model –

absolutely NO (almost no) redundancy

Database Design and Redundancy

Prof Age

Muntz 60student BS Prof

MM CS Muntz

YC EE Muntz

Student BS Prof Age

MM CS Muntz 60

YC EE Muntz 60

Database design and redundancy

Person Address City State zip

X A1 LAX CA 90066

Y A2 LAX CA 90066

Entity Relationship (ER model)

Consider students and professors in a dept, with a relationship advisor

Student Prof since

MM Muntz 1998

YC Muntz 2000

ER Model (contd…)

N-ary relationship

Relational Model

Every relation has a key Relationships are represented using

foreign keys Foreign key from A to B represents

A (_, 1) : B (_, _) relationship

Supplier Part City lastShipment

Relational Model (contd…)

Supplier Part City lastShipment

PName

Muntz

Student Professor since

MM Muntz 1998

YC Muntz 2000

Relationships in XML model

A (1, 1) : B (_, _) can be represented using parent-child relationships as

B A* prof (@PName, student*)<prof PName=“Muntz”>

<student SName=“MM” since=“1998”/>

<student SName=“YC” since=“1998”/>

</prof>

Entity Relationship (ER model)

Consider students and professors in a dept, with a relationship advisor

Student Prof since

MM Muntz 1998

YC Muntz 2000

Using ID/IDREF to represent relationships

A (_, 1) : B (_, _) can be represented using ID/IDREF as Define an ID attribute for B Define an IDREF attribute for A referring B

prof (@PName, @id), student (@SName, @since, @idref::prof)<prof PName=“Muntz” id=“P1”/><student SName=“MM” since=“1998” @idref=“P1”/><student SName=“YC” since=“2000” @idref=“P1”/>

Using ID/IDREFS to represent relationships – not Really… ID/IDREFS can represent any binary relationship – A (_, _)

: B (_, _), but cannot represent attributes for relationship

A (@id) B (@idrefs::A*) student (SName, @id) professor (PName, @idrefs::student*) <student SName=“MM” id=“S1”/> <student SName=“YC” id=“S2”/> <professor PName=“Muntz” @idrefs=“S1 S2”/>

Using foreign keys to represent relationships

student (SName, Professor, since)

professor (PName)

Summary so far…

XML schemas allow us to represent relationships in a friendlier way…

All foreign key constraints can be represented using parent-child or ID/IDREF – we do not really need foreign keys

IDREFS not recommended for representing relationships.

Constraint specification in XML – questions to be asked

Node equality vs value equality (or) Can a path field produce an element?

Can a path field produce a set of elements/values? – if so, what semantics?

Should a path field exist? (or) Can a path field return empty?

Should path expressions traverse only down the tree? Should our constraints be based on type selectors or

should they be based on path expression selectors? If we use path expression or type selectors, do we need

relative keys?

Node Equality

Makes it easier, but… When are two elements equal – their

serialized string values ignoring the order of attributes is the same.

We have used order among child nodes in defining node equality…

Can a path field produce a set of values?

professor (Pname, Age)<professor>

<Age>60</Age>

<Pname>Muntz</Pname>

<Pname>Chu</Pname>

</professor>

If a type X has a key (X1, X2, …, Xn), then the set Y1 * Y2 * … * Yn should be unique

Should a path expression traverse only down the tree? Trade off is relative keys vs traversing up the tree.. For example, consider student, professor with a

difference – a student can have multiple professors. Consider the same designProfessor (PName, Student*)Student (Sname)

Key for student can be specified as either(professor, Sname) (or)Key for student relative to professor is (Sname) But this is bad design anyways…

Three different constraint specifications UCM – WWW10

Type selectors, no relative keys, path expressions can produce set of values.

Keys for XML – WWW10 Path selectors, relative keys specified through

paths, path expressions cannot produce set of values.

W3C XML Schema Path selectors, relative keys specified through

types, path expressions cannot produce set of values.

Commonalities across the 3 specifications

No concept of node equality Path expressions traverse only down the

tree A path field should exist

Summary about Data Modeling

Entity types map to element types. Some relationship types map to element types. Ability to define element types –

RELAX NG provides the ability for us to define element types,

In XML Schema, this is not so easy. Key constraints based on type selectors seem

the right way to go.

XML Processing and Subtyping

Subtyping is essential for static type checking

function f1 : a{A} B*,C* {for $x in a//name return <b/>;for $x in a//name return <c/>; }

function f2 : d{(B, B)*, (C, C)* | B, (B,B)*, C, (C, C)*} { … }

Is this type-safe? Type-inferencing vs type-checking problem.

Two techniques for subtyping

Implicit – tree/hedge language inclusion A type A is a subtype of type B iff L (A) is a

sublanguage of L (B) – used in XDuce Explicit – user specifies type hierarchy

As in XML Schema Explicit subtyping “implicitly” solves type-

inferencing vs type checking problem. Implicit subtyping poses several interesting

research problems.

data modeling using xml schemas murali mani extreme 2002

Documents

redundancy slide

prof slide

professor pname slide

entity relationship

key relationships

relationships student

entities relationships

parentchild relationships