ii. xml data management

33
II. XML Data Management II. XML Data Management A : XML refresher using material from A. Silverschatz and M. Sapossnek B: - XML-Data Management (1) Query languages: XPATH, XQuery, SQLX C: - Mapping XML data to databases - Native XML Data

Upload: gada

Post on 12-Jan-2016

55 views

Category:

Documents


0 download

DESCRIPTION

II. XML Data Management. A : XML refresher using material from A. Silverschatz and M. Sapossnek B: - XML-Data Management (1) Query languages: XPATH, XQuery, SQLX C:- Mapping XML data to databases - Native XML Data management. What is XML?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: II.   XML Data Management

II. XML Data ManagementII. XML Data Management

A : XML refresher using material from A. Silverschatz and M. Sapossnek

B: - XML-Data Management (1) Query languages:

XPATH, XQuery, SQLXC: - Mapping XML data to databases

- Native XML Data management

Page 2: II.   XML Data Management

HS / DBSII-03-XML-1 2

What is XML?What is XML?

Acronym for eXtensible Markup Language

Syntax for structuring data and documents in human-readable form

THE "Syntax of the WEB"

Meta language for defining languages

Bases of many extensions NamespacesStylesheetsHyperlinksSchemata

Standardized by W3Chttp://www.w3.org/TR/REC-xml

Page 3: II.   XML Data Management

HS / DBSII-03-XML-1 3

What XML is Not..What XML is Not..

No protocol Language for describing data Used as data format in protocols Protocols may be syntactically defined by XML

No programming languagebut XML documents may contain code fragments New languages allow for XML – code as part of the

language (Xen, a MS extension of C# ) Some XML extensions with superimposed PL semantics,

rule semantics in XSLT

No magic semantics Interpretation by humans, applications,

standards derived from XML

Page 4: II.   XML Data Management

HS / DBSII-03-XML-1 4

Why XML? Why XML?

… not a question any more, since widely adopted

Simple

Extensible

Easy to process

Easy to generate

Data interchange critical for networked applications"XML will be the ASCII of the Web:

basic, essential, unexciting"

Tim Bray

... it is already

Page 5: II.   XML Data Management

HS / DBSII-03-XML-1 5

XML exampleXML example

Pre-XML representation of data:

XML representation of the same data:

“PO-1234”,”CUST001”,”X9876”,”5”,”14.98”

Elements

<?xml version="1.0"?><PURCHASE_ORDER>

<PO_NUM> PO-1234 </PO_NUM><CUST_ID> CUST001 </CUST_ID><ITEM ItemNum ="2">

< QUNTY > 2 </ QUNTY >

<PRICE> 14.53 </PRICE>

</ITEM></PURCHASE_ORDER>

Attribute

Prologue

Page 6: II.   XML Data Management

HS / DBSII-03-XML-1 6

XML exampleXML example

Graphical representation

PO_NUM

PURCHASE_ORDER

Cust:_ID

QUNTYPO-1234

CUST001

XML documents - tree structured - Data an metadata in the same document

(as opposed to RDBS)

2

{ ItemNum=X9876 }ITEM

PRICE

14.53

Page 7: II.   XML Data Management

HS / DBSII-03-XML-1 7

XML UsageXML Usage

Two basic types of XML usage

Document centric (document oriented) structuring a digital document, including logical

layout primary focus of SGML - predecessor of XML

Data centric Description of data in a self describing form for later

processing

Distinction not totally clear See purchase order example:

If typical document characteristic included (company addr.,customer addr, date, …, company logo) it would be a document oriented usage of XML

Page 8: II.   XML Data Management

HS / DBSII-03-XML-1 8

Document centric XML documents: Document centric XML documents: exampleexample

<Product> <Name>Variabler Maulschlüssel</Name> <Developer> Full Fabrication Labs, Inc. </Developer> <Summary> Großer, verstellbarer Schraubenschlüssel</Summary> <Description> <Para>Der Engländer besteht aus erstklassigem Stahl und besitzt einen gummierten Handgriff. Die Maulgröße liegt zwischen 0 und 32 mm. </Para> <Para>Sie können..... </Para>

<List> <Item> <Link URL="Order.html"> Bestellen </Link></Item> <Item> <Link URL="Wrenches.htm"> Andere Werkzeuge ansehen </Link> </Item> <Item> <Link URL="catalog.zip"> Den Katalog herunterladen </Link> </Item> </List> <Para> Der Schraubenschlüssel kostet 15.33 Euro inkl. MWSt. Wenn Sie jetzt bestellen, erhalten Sie zusätzlich unsere wertlose Hobbybastler-Fibel.</Para> </Description></Product> Typical:Long text elements

Page 9: II.   XML Data Management

HS / DBSII-03-XML-1 9

Data centric XML documents: exampleData centric XML documents: example

<Orders> <SalesOrder SONumber="12345"> <Customer CustNumber="543"> <CustName> ABC Industries</CustName> <Street> 123 Main St.</Street> <City>Chicago</City> .... </Customer> <Line LineNumber="1"> <Part PartNumber="123"> <Description> <p><b> Turkey wrench:</b><br /> Stainless steel, one-piece construction, lifetime guarantee.</p> </Description> <Price>9.95</Price> </Part> <Quantity>10</Quantity> </Line> ....... </SalesOrder> </Orders>

Page 10: II.   XML Data Management

HS / DBSII-03-XML-1 10

XML SyntaxXML Syntax

One, and only one, root element

Sub-elements must be properly nested A tag must end within the tag in which it was

started

Attributes are optional

Attribute values must be enclosed in “” or ‘’ No data type but 'string'

Processing instructions optional

XML is case-sensitive <tag> and <TAG> are not the same type of element

Page 11: II.   XML Data Management

HS / DBSII-03-XML-1 11

Why hierarchical "data model"?Why hierarchical "data model"?

Hierachies (nesting) in data bases? Why not?

REDUNDANCY! Multiple items, customers, … occur multiple times

in different ordersNormalization replaces redundancies by foreign

keysOO / OR – Data bases??

Nesting useful in data transfer External application does not have access to

foreign key / to database.

Page 12: II.   XML Data Management

HS / DBSII-03-XML-1 12

XML Attributes vs ElementsXML Attributes vs Elements

Distinction between subelement and attribute In the context of documents:

attributes are part of markup subelement contents part of the basic document

contents

In the context of data representation: difference not clear, but confusing

Same information can be represented in two ways <account account-number = “A-101”> …. </account> <account>

<account-number> A-101 </account-number> …

</account>

Suggestion: use attributes for identifiers of elements use subelements for contents

Page 13: II.   XML Data Management

HS / DBSII-03-XML-1 13

How to use XML data?How to use XML data?

Basic Idea

ApplicationApplicationwithwith

XML-GeneratorXML-Generator

XML-XML-ParserParser

Receiving Receiving applicatioapplicatio

nn

DBMSDBMSDBMSDBMS

DOMDOM

SAXSAX

Standard-Standard-InterfacesInterfaces

How does application know about - syntactical correctness - data semantics ?

Page 14: II.   XML Data Management

HS / DBSII-03-XML-1 14

Correct or not correct ?

• Different encodings Different encodings • specified by encoding specified by encoding attribute attribute

Page 15: II.   XML Data Management

HS / DBSII-03-XML-1 15

Correctness of XML documentsCorrectness of XML documents Syntactic correctness

Conformance to XML syntax Document structured according to XML syntax is well-

formed Compare Syntax checker for program

Semantic correctness Given Meta level description of XML documents:

Document Type Definition (DTD) or XML Schema Document is valid with respect to DTD (Schema)

if all definitions and restrictions have been fulfilled No DTD allowed, applications must know, what is

meant

What is semantics?? Interpretation of tags is a matter of humans and/or

the application program: <xyz> could mean "book title" or "first name" or…

Page 16: II.   XML Data Management

HS / DBSII-03-XML-1 16

XML NamespacesXML Namespaces

Part of XML’s extensibility

Allow autonomous users to differentiate between tags of the same name (using a prefix) Frees author to focus on the data and decide how to

best describe it Allows multiple XML documents from multiple

authors to be merged

Namespace declaration Prefix URI (URL)

xmlns: bk = “http://www.example.com/bookinfo/”

Page 17: II.   XML Data Management

HS / DBSII-03-XML-1 17

NamespaceNamespace

Examples

No prefix: all elements belong to same namespace

<BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE>

<BOOK xmlns=“http://www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR>

Page 18: II.   XML Data Management

HS / DBSII-03-XML-1 18

DTD and XML schemaDTD and XML schema

Type of XML document defined as DTD - not expressible in XML syntax XML schema

Document Type Definition (DTD) Does not constrain types:

all values are strings in XML Syntax

<!ELEMENT elem (subelement-spec)>

<!ATTLIST elem (attribute-specs) >

Page 19: II.   XML Data Management

HS / DBSII-03-XML-1 19

DTD: elements and attributesDTD: elements and attributes Example (element decl)

<!ELEMENT depositor (customer-name account-number)>

<!ELEMENT customer-name (#PCDATA) ><!ELEMENT account-number (#PCDATA) >

Subelements names of elements #PCDATA (parsed character data), i.e., character strings EMPTY (no subelements) or ANY (anything can be a

subelement)

Subelement specification may have regular expressions <!ELEMENT bank ( ( account | customer |

depositor)+)> Notation:

“|” : alternatives “+” : 1 or more occurrences "?" 0 or one “*” : 0 or more occurrences

Page 20: II.   XML Data Management

HS / DBSII-03-XML-1 20

DTD exampleDTD example

<!DOCTYPE bank [

<!ELEMENT bank ( ( account | customer | depositor)+)>

<!ELEMENT account (account-number branch-name balance)>

<!ELEMENT customer (customer-name customer-street customer-city)>

<!ELEMENT depositor (customer-name account-number)>

<!ELEMENT account-number (#PCDATA)><!ELEMENT branch-name (#PCDATA)><!ELEMENT balance (#PCDATA)>

<!ELEMENT customer-name (#PCDATA)><!ELEMENT customer-street (#PCDATA)><!ELEMENT customer-city (#PCDATA)>

]>

Page 21: II.   XML Data Management

HS / DBSII-03-XML-1 21

DTD attributesDTD attributes

Attribute specification : for each attribute Name Type of attribute

CDATA ID (identifier) or IDREF (ID reference) or IDREFS

more on this later

Whether mandatory (#REQUIRED) has a default value (value), or neither (#IMPLIED)

Examples <!ATTLIST account acct-type CDATA “checking”> <!ATTLIST customer

customer-id ID # REQUIREDaccounts IDREFS # REQUIRED >

Page 22: II.   XML Data Management

HS / DBSII-03-XML-1 22

DTD attribute IDDTD attribute ID

At most one attribute of type ID per element

ID attribute value of each element in an XML document must be distinct ID attribute value is object identifier

attribute of type IDREF must contain the ID value of an element in the same document

attribute of type IDREFS contains a set of (0 or more) ID values. ID value must contain the ID value of an element in the same document

ID, IDREF, IDREFS do not designate a particular domain (no type!)

Page 23: II.   XML Data Management

HS / DBSII-03-XML-1 23

DTD declarationDTD declaration

External DTD-declaration<?xml version="1.0"><!DOCTYPE bank SYSTEM "http://www.x-ag.de/banks.dtd"><bank> ... </bank>

Internal DTD-declaration<!DOCTYPE custDesc [ <!ELEMENT custDesc (#PCDATA)> ]><custDesc> consumer rights protagonist </custDesc>

Mixed usage<!DOCTYPE bank SYSTEM "http://www.x-ag.de/banks.dtd" [

<!ATTLIST bank Descr CDATA #REQUIRED>]><bank Descr=" mostly private customers and ATM"> ... </bank>

Page 24: II.   XML Data Management

HS / DBSII-03-XML-1 24

DTD limitsDTD limits

No typing of text elements and attributes All values are strings, no integers, reals, etc.

Difficult to specify unordered sets of subelements Order is usually irrelevant in databases (A | B)* allows specification of an unordered set, but

Cannot ensure that each of A and B occurs only once How to express: a, b and c in arbitrary order?

<!ELEMENT a ((b,c,d) | (c,b,d) | (b,d,c), ...)>

IDs and IDREFs are untyped The owners attribute of an account may contain a

reference to another account, which is meaningless owners attribute should ideally be constrained to refer

to customer elements

Page 25: II.   XML Data Management

HS / DBSII-03-XML-1 25

XML SchemaXML Schema

XML Schema (XSD): much more expressible Schema language compared to DTD schemas Typing of values

E.g. integer, string, etc constraints on min/max values

User defined types specified in XML syntax, unlike DTDs

More standard representation, but verbose

namespace support Many more features

List types, uniqueness and foreign key constraints, inheritance Ability to map to RDB,…

significantly more complicated than DTD syntax

Use of XSD recommended

Page 26: II.   XML Data Management

<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema>

<xsd:element name=“bank” type=“BankType”/>

<xsd:element name=“account”><xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:squence></xsd:complexType>

</xsd:element>….. definitions of customer and depositor ….

<xsd:complexType name=“BankType”><xsd:squence>

<xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/>

<xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded” />

<xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/></xsd:sequence>

</xsd:complexType></xsd:schema>

XSD example(from Silverschatz)

Page 27: II.   XML Data Management

HS / DBSII-03-XML-1 27

Using XMLUsing XML

Data exchange Data management:

Store, retrieve, query large document sets efficiently Today's solutions:

Mapping to RDB / ORDB / OODB "Native" XML data management (not necessarily very

different from storing in conventional DB)

Standardized data description: different extensions and applications Bioinformatic Sequence Markup Language (BSML) MathML Scalable Vector Graphics (SVG)

.. And many, many more Ressource Description in the web (RDF) …

Page 28: II.   XML Data Management

HS / DBSII-03-XML-1 28

Using XML: RDF with XML syntaxUsing XML: RDF with XML syntax

CreatorCreator

www.me.de/~fritzwww.me.de/~fritz Homepage

Fritz MüllerFritz MüllerFritz MüllerFritz Müller

RDF-ModellRDF-Modell

<?xml version="1.0"?><?xml version="1.0"?><RDF<RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:s="http://description.org/schema/">xmlns:s="http://description.org/schema/"> <Description about="<Description about="http://www.me.de/~fritzhttp://www.me.de/~fritz">"> <s:Creator><s:Creator>Fritz MüllerFritz Müller</s:Creator></s:Creator> </Description></Description> <Description <Description [email protected]> <s:emailOf> Fritz Müller </s:emailOf>> <s:emailOf> Fritz Müller </s:emailOf> </Description> </Description></RDF></RDF>

Encoded in XML:

Many of these triples form a graph

[email protected]@[email protected]@web.de

emailOfemailOf

Page 29: II.   XML Data Management

HS / DBSII-03-XML-1 29

Using XMLUsing XML

Layout of documents? XML documents have logical structure Layout structure needed for output

Use transformation language to describe device specific transformations

XMLXML-Doc.-Doc.(Layout-(Layout-transf.)transf.)

Standard-Standard-SoftwareSoftware

(XSL-Processor)(XSL-Processor)

XMLXML-Doc.-Doc.(device spec. (device spec.

Layout)Layout)

Standard Standard SoftwareSoftware

((HTMLHTML-Browser)-Browser)XMLXML-Doc.-Doc.

(Daten)(Daten)

Transformation into all kinds of languages (HTML, pdf, …)on all kinds of devices

Page 30: II.   XML Data Management

HS / DBSII-03-XML-1 30

XML transformationXML transformation

XSLT: The language used for converting XML documents into other forms

Describes how the document is transformed

Expressed as an XML document (.xsl)

Template rules Patterns match nodes in source document Templates instantiated to form part of result

document

XPath for querying, sorting, etc.

XSL-FO language for describing layout

XSL = XSLT + XPATH + XSL-FO

Page 31: II.   XML Data Management

HS / DBSII-03-XML-1 31

XML transformation: example (1)XML transformation: example (1)

Document

<sales> <summary> <heading>Scootney Publishing</heading> <subhead>Regional Sales Report</subhead> <description>Sales Report</description> </summary> <data> <region> <name>West Coast</name> <quarter number="1" books_sold="24000" /> <quarter number="2" books_sold="38600" /> <quarter number="3" books_sold="44030" /> <quarter number="4" books_sold="21000" /> </region> ... </data></sales>

Page 32: II.   XML Data Management

HS / DBSII-03-XML-1 32

XML transformation: example (2)XML transformation: example (2)

XSL style sheet - mapping to HTML<xsl:param name="low_sales" select="21000"/><BODY> <h1><xsl:value-of select="//summary/heading"/> </h1> ... <table><tr><th>Region\Quarter</th> <xsl:for-each select="//data/region[1]/quarter"> <th>Q<xsl:value-of select="@number"/></th> </xsl:for-each> ... <xsl:for-each select="//data/region"> <tr><xsl:value-of select="name"/></th> <xsl:for-each select="quarter"> <td><xsl:choose> <xsl:when test="number(@books_sold &lt;= $low_sales)"> color:red;</xsl:when> <xsl:otherwise>color:green;</xsl:otherwise></xsl:choose> <xsl:value-of select="format-number (@books_sold,'###,###')" /> </td> ... <td><xsl:value-of select="format-number(sum(quarter/@books_sold), '###,###')"/>

XPath expression

XPath: query languageon doc trees

Page 33: II.   XML Data Management

HS / DBSII-03-XML-1 33

XML transformation: example (2)XML transformation: example (2)

The result