data modeling using xml schemas murali mani extreme 2002
Post on 18-Dec-2015
222 views
TRANSCRIPT
Data Modeling using XML Schemas
Murali Mani
Extreme 2002
What this talk is not about
Not about
<review> <reviewer>X</reviewer> gave a <rating>two thumbs up</rating> for the <movie>Fugitive, The</movie></review>
We talk about data modeling from database perspective.
What is database perspective?
Our world consists of Entities Relationships
binary - 1:1, 1:many, many:many n-ary recursive
Attributes for entities Attributes for relationships
Outline of the talk
How XML can contribute to the DB community. Introduction of the ER model How ER concepts are modeled using relational
model Mapping ER concepts to XML model Constraint specification for XML – what are the
options? Subtyping for XML processing – do we need it,
and what are the options?
How XML can contribute to DB community
Standard exchange format Superior data model?
Recursive relationships Union types
Person
(name | (lastname, firstname), age, address) Frendlier representation of relationships?
person (person*)person (person?)
Person Age Father
X 25 Y
Y 55 null<person Y, 55> <person X, 25/></person>
Data Modeling
What is a data model? Structural specification Specification of constraints Operations to retrieve/update the data
Stages in database design Conceptual model Logical Model Physical
model Conceptual Model and Logical Model –
absolutely NO (almost no) redundancy
Database Design and Redundancy
Prof Age
Muntz 60student BS Prof
MM CS Muntz
YC EE Muntz
Student BS Prof Age
MM CS Muntz 60
YC EE Muntz 60
Database design and redundancy
Person Address City State zip
X A1 LAX CA 90066
Y A2 LAX CA 90066
Entity Relationship (ER model)
Consider students and professors in a dept, with a relationship advisor
Student Prof since
MM Muntz 1998
YC Muntz 2000
ER Model (contd…)
N-ary relationship
Relational Model
Every relation has a key Relationships are represented using
foreign keys Foreign key from A to B represents
A (_, 1) : B (_, _) relationship
Supplier Part City lastShipment
Relational Model (contd…)
Supplier Part City lastShipment
PName
Muntz
Student Professor since
MM Muntz 1998
YC Muntz 2000
Relationships in XML model
A (1, 1) : B (_, _) can be represented using parent-child relationships as
B A* prof (@PName, student*)<prof PName=“Muntz”>
<student SName=“MM” since=“1998”/>
<student SName=“YC” since=“1998”/>
</prof>
Entity Relationship (ER model)
Consider students and professors in a dept, with a relationship advisor
Student Prof since
MM Muntz 1998
YC Muntz 2000
Using ID/IDREF to represent relationships
A (_, 1) : B (_, _) can be represented using ID/IDREF as Define an ID attribute for B Define an IDREF attribute for A referring B
prof (@PName, @id), student (@SName, @since, @idref::prof)<prof PName=“Muntz” id=“P1”/><student SName=“MM” since=“1998” @idref=“P1”/><student SName=“YC” since=“2000” @idref=“P1”/>
Using ID/IDREFS to represent relationships – not Really… ID/IDREFS can represent any binary relationship – A (_, _)
: B (_, _), but cannot represent attributes for relationship
A (@id) B (@idrefs::A*) student (SName, @id) professor (PName, @idrefs::student*) <student SName=“MM” id=“S1”/> <student SName=“YC” id=“S2”/> <professor PName=“Muntz” @idrefs=“S1 S2”/>
Using foreign keys to represent relationships
student (SName, Professor, since)
professor (PName)
Summary so far…
XML schemas allow us to represent relationships in a friendlier way…
All foreign key constraints can be represented using parent-child or ID/IDREF – we do not really need foreign keys
IDREFS not recommended for representing relationships.
Constraint specification in XML – questions to be asked
Node equality vs value equality (or) Can a path field produce an element?
Can a path field produce a set of elements/values? – if so, what semantics?
Should a path field exist? (or) Can a path field return empty?
Should path expressions traverse only down the tree? Should our constraints be based on type selectors or
should they be based on path expression selectors? If we use path expression or type selectors, do we need
relative keys?
Node Equality
Makes it easier, but… When are two elements equal – their
serialized string values ignoring the order of attributes is the same.
We have used order among child nodes in defining node equality…
Can a path field produce a set of values?
professor (Pname, Age)<professor>
<Age>60</Age>
<Pname>Muntz</Pname>
<Pname>Chu</Pname>
</professor>
If a type X has a key (X1, X2, …, Xn), then the set Y1 * Y2 * … * Yn should be unique
Should a path expression traverse only down the tree? Trade off is relative keys vs traversing up the tree.. For example, consider student, professor with a
difference – a student can have multiple professors. Consider the same designProfessor (PName, Student*)Student (Sname)
Key for student can be specified as either(professor, Sname) (or)Key for student relative to professor is (Sname) But this is bad design anyways…
Three different constraint specifications UCM – WWW10
Type selectors, no relative keys, path expressions can produce set of values.
Keys for XML – WWW10 Path selectors, relative keys specified through
paths, path expressions cannot produce set of values.
W3C XML Schema Path selectors, relative keys specified through
types, path expressions cannot produce set of values.
Commonalities across the 3 specifications
No concept of node equality Path expressions traverse only down the
tree A path field should exist
Summary about Data Modeling
Entity types map to element types. Some relationship types map to element types. Ability to define element types –
RELAX NG provides the ability for us to define element types,
In XML Schema, this is not so easy. Key constraints based on type selectors seem
the right way to go.
XML Processing and Subtyping
Subtyping is essential for static type checking
function f1 : a{A} B*,C* {for $x in a//name return <b/>;for $x in a//name return <c/>; }
function f2 : d{(B, B)*, (C, C)* | B, (B,B)*, C, (C, C)*} { … }
Is this type-safe? Type-inferencing vs type-checking problem.
Two techniques for subtyping
Implicit – tree/hedge language inclusion A type A is a subtype of type B iff L (A) is a
sublanguage of L (B) – used in XDuce Explicit – user specifies type hierarchy
As in XML Schema Explicit subtyping “implicitly” solves type-
inferencing vs type checking problem. Implicit subtyping poses several interesting
research problems.