10. xml storage 1 xml databases 10 . xml storage 1 – overvie · • indexes atomar values of an...

10
Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de XML Databases 10. XML Storage 1 – Overview 10.1 Motivation 10.2 Text-based storage 10.2.1 Index structures 10.3 Model-based storage 10.4 Schema-based storage 10.5 Conclusion 10.6 Overview and References XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 2 10. XML Storage 1 Applications require different types of XML documents Structure vs. content Regular vs. irregular Thus, XML documents are Data-centric Document-centric or somewhere in-between Questions Storage of XML documents Efficient processing of queries on the stored documents or data There are several methods for storage 1 st goal: Learn and understand methods 2 nd goal: Classify methods Principles Advantages and disadvantages Usage XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 3 10.1 Motivation Characterisation of XML documents: Data-centric documents Structured, regular E.g. product catalog, order, invoice Document-centric documents Unstructured, irregular E.g. scientific article, book, email, web page Semi-structured documents Data-centric and document-centric parts E.g. publications, Amazon, MS Press (example chapters) XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 4 10.1 Motivation Requirements for the physical layer: Order preserving and lossless storage of XML documents Efficient access to XML documents or parts thereof Quick response time for Queries Update operations Indexing Transaction processing Support of XPath and XQuery Support of SAX and DOM for applications XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 5 10.1 Motivation Storage approaches for XML documents Text-based Storage as character data Model-based Generic storage of the graph structure Storage of the DOM Schema-based Mapping to (object-)relational databases Deriving the database schema from the XML structure Using user defined mapping procedures XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 6 10.1 Motivation

Upload: others

Post on 24-Aug-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 10. XML Storage 1 XML Databases 10 . XML Storage 1 – Overvie · • Indexes atomar values of an XML document, like element content or attribute values • Index format for structured

Silke EcksteinAndreas KupferInstitut für InformationssystemeTechnische Universität Braunschweighttp://www.ifis.cs.tu-bs.de

XML Databases10. XML Storage 1 –Overview

10.1 Motivation

10.2 Text-based storage

10.2.1 Index structures

10.3 Model-based storage

10.4 Schema-based storage

10.5 Conclusion

10.6 Overview and References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 2

10. XML Storage 1

• Applications require different types of XML documents– Structure vs. content– Regular vs. irregular

• Thus, XML documents are– Data-centric– Document-centric – or somewhere in-between

• Questions– Storage of XML documents– Efficient processing of queries on the stored documents or data

• There are several methods for storage– 1st goal: Learn and understand methods– 2nd goal: Classify methods

• Principles• Advantages and disadvantages• Usage

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 3

10.1 Motivation

• Characterisation of XML documents:

– Data-centric documents

• Structured, regular

• E.g. product catalog, order, invoice

– Document-centric documents

• Unstructured, irregular

• E.g. scientific article, book, email, web page

– Semi-structured documents

• Data-centric and document-centric parts

• E.g. publications, Amazon, MS Press (example chapters)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 4

10.1 Motivation

• Requirements for the physical layer:

– Order preserving and lossless storage of XML documents

– Efficient access to XML documents or parts thereof

• Quick response time for

– Queries

– Update operations

• Indexing

• Transaction processing

• Support of XPath and XQuery

• Support of SAX and DOM for applications

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 5

10.1 Motivation

• Storage approaches for XML documents

– Text-based

• Storage as character data

– Model-based

• Generic storage of the graph structure

• Storage of the DOM

– Schema-based

• Mapping to (object-)relational databases

– Deriving the database schema from the XML structure

– Using user defined mapping procedures

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 6

10.1 Motivation

Page 2: 10. XML Storage 1 XML Databases 10 . XML Storage 1 – Overvie · • Indexes atomar values of an XML document, like element content or attribute values • Index format for structured

10.1 Motivation

10.2 Text-based storage

10.2.1 Index structures

10.3 Model-based storage

10.4 Schema-based storage

10.5 Conclusion

10.6 Overview and References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 7

10. XML Storage 1

• The whole XML document text is stored ascharacter data– File in the file system– CLOB (Character-Large-OBject) in the DBS

• Operations documents as a whole are very efficient– Reading and writing the whole document– But the content is monolithic and opaque with respect to

the relational query engine (query can't inspect a fragment)

• Getting granular access requires additional support– Full text index– Path index

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 8

10.2 Text-based storage

• Index structures for XML documentsallow efficient access for specific queries

– Different types of indexes are optimized for different types of queries

• Generate redundancy

– Index has to be up-to-date by propagating datachanges

• Index structures can be storage structures as well

– They define the storage method

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 9

10.2.1 Index structures

• Types of index structures– Value index

• Indexes atomar values of an XML document, like element content orattribute values

• Index format for structured parts of XML documents• Already known from databases (B-trees, hash index, …)

– Full text index• Indexes single words from the full text• Index format for unstructured parts of XML documents• Already known from Information Retrieval (inverted lists, tries, suffix

trees, …)

– Path index• Indexes subtrees/paths in an XML document• Index format for semistructured parts of XML documents• Already known from object-databases (access support relations, …)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 10

10.2.1 Index structures

• B-tree as value index for an XML fragment document

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 11[Tür08]

10. 2.1 Index structures

• Full text index– Not limited to exact matches

• Keyword-based search and boolean retrieval• Pattern search (with regular expressions)

– Use of• Statistical, word-based methods

– Stop word removal– Elimination of uncommon items

• Linguistic methods– Normalization of words (e.g. capitalisation, hyphenation,) – Word decomposition by rules (engl.) or dictionaries (german)– Stemming

• Knowledge-based methods– Use of ontologies and thesauri to search for synonyms, hypernyms and

hyponyms

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 12

10. 2.1 Index structures

Page 3: 10. XML Storage 1 XML Databases 10 . XML Storage 1 – Overvie · • Indexes atomar values of an XML document, like element content or attribute values • Index format for structured

• Inverted list as full text index for XML

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 13[Tür08]

10. 2.1 Index structures

word occurrence word position in the text

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 14[Tür08]

10. 2.1 Index structures

word occurrenceword occurrence

• Path index

– Structure information must be identifiable andreconstructable

• Assigning the markup to the content as well as

• Representing the hierarchical nesting and order ofelements/attributes

– Especially suited for keyword search with regard tostructure or path expressions

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 15

10. 2.1 Index structures

FOR $b IN //book

WHERE CONTAINS($b/author,"Benjamin")

RETURN $b

• Types of path indexes– Nested path index

• Access to root node from everynode

– Multi-index• Accessing parent nodes

– Join-index• Access parent and child nodes

– Access Support Relations (ASR)

• Generalization of indexes above,by listing all paths in a table

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 16[Tür08]

10. 2.1 Index structures

• Conclusion– Efficient query processing on XML documents

requires different types of index structures

– Value index• For efficient access to structured parts

• Keyword search, value search

– Full text index• For efficient access to unstructured parts

– Path index• Using the document structure

• Navigating queries

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 17

10. 2.1 Index structures

• Summary text-based storage– Schema definition:

• not required

– Document reconstruction:• documents stay in their original format

– Queries:• Information retrieval queries• Processing the markup of the queries• XML queries possible

– Special features:• Full text functions

– Efficiency:• Character string must be parsed on every access with XML processorsà expensive

• No concurrency on read or write à no parallel processing

– Usage: • For document-centric XML applications• Suitable to only a limited extent also for semi-structured applications

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 18

10.2 Text-based storage

Page 4: 10. XML Storage 1 XML Databases 10 . XML Storage 1 – Overvie · • Indexes atomar values of an XML document, like element content or attribute values • Index format for structured

10.1 Motivation

10.2 Text-based storage

10.2.1 Index structures

10.3 Model-based storage

10.4 Schema-based storage

10.5 Conclusion

10.6 Overview and References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 19

10. XML Storage 1

• Idea: generic storage of the graph structure– XML elements, XML attributes, … are nodes of a graph– Nesting of elements defines edges– Nodes get an (internal) ID based on graph traversal

• Using relations or object classes to store elements andattributes

• Document structure can be restored completely• Extension for data type adapted storage is possible

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 20

10.3 Model-based storage

ID Element name Value Reference to preceeding Rank

ID Attribute name Value Reference to element

Elements

Attributes

• The EDGE approach [FK99]

– Variant BINARY: horizontal partition of EDGE based on label

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 21[Tür08]

10.3 Model-based storage

XML documents

• XML queries

– XML queries (XPath, XQuery) are mapped to SQL queries (taking storage structures into account)

– Result of XML query is generated from result ofdatabase query

• "Labeling" of the result tuples

• Result is in XML format

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 22[Tür08]

10.3 Model-based storage

• Example: list bargain buy with prices

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 23

10.3 Model-based storage

SELECT a.content, b.content FROM Edge a, Edge b

WHERE (a.label = 'price') AND (a.content < 10.00)

AND (b.label = 'description')

AND (b.parent = a.parent) AND (a.key = b.key)

[Tür08] XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 24[Tür08]

10.3 Model-based storage

• DOM-based storage

– Information from theDocument Object Modelare stored in the database

– Storage alternatives

• (Object-)relational databases

• Object-oriented databases

• Developing own datastructure

Page 5: 10. XML Storage 1 XML Databases 10 . XML Storage 1 – Overvie · • Indexes atomar values of an XML document, like element content or attribute values • Index format for structured

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 25[Tür08]

10.3 Model-based storage

Node type:

ELEMENTNode type:

ATTRIBUTE

Node type:

TEXT

DOM-based storage – example • XML Queries

– XML queries (DOM method invocations) are mappedto SQL queries (taking storage structures intoaccount)

– Result of method invocation is generated from resultof database query

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 26[Tür08]

10.3 Model-based storage

Summary model-based storage– Schema definition:

• not required for storage

– Document reconstruction:• Possible, but expensive

– Queries:• XML queries possible• Adapted database queries

– Special features:• Querying many elements/attributes is expensive

– Efficiency:• Navigation from the given context is efficient• Restoring the document and evaluating path expressions is inefficient

– Usage: • For data- and document-centric as well as for semi-structured

XML applications

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 27

10.3 Model-based storage

10.1 Motivation

10.2 Text-based storage

10.2.1 Index structures

10.3 Model-based storage

10.4 Schema-based storage

10.5 Conclusion

10.6 Overview and References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 28

10. XML Storage 1

• Motivation– XML content shall be stored in a conventional database– Accepting the loss of native access– DB schema is derieved from a DTD or an XML schema

• Problem– Generate DB schema automatically– Thereby use as much structure information as possible

• General approach for mapping from a DTD– Transform DTD into a tree representation– Nodes: element types, attributes, etc. (type layer!!!)– Edges: nesting relationships of element types and their restrictions– Traverse tree in order to transform nodes and edges into database

tables (according to certain rules)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 29

10.4 Schema-based storage

• Generating the DB schema for a DTD:

– Rules to map element types:

– Rules to map attributes:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 30

10.4 Schema-based storage

XML element type à column of a tableSequence of element types à columns of a tableAlternative of element types à column of a tableElement type with quantifier ? à column with null valuesElement type with quantifier +,* à set/list of columns (SET OF, LIST OF)Nested element types à TUPLE OF

XML attribute à column of a tableIMPLIED à null values allowedREQUIRED à null values not allowedDefault value à DEFAULT constraint

Page 6: 10. XML Storage 1 XML Databases 10 . XML Storage 1 – Overvie · • Indexes atomar values of an XML document, like element content or attribute values • Index format for structured

• Mapping to relational databases– DTD is usually required– Queries use SQL functionality– RDBMS data types are used (e.g. prices are NUMERIC)– Problem: Mapping of collection types

• Subdivide into additional relations

– Example:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 31

10.4 Schema-based storage

Comment_ID Customer_info Feedback

44901 C0001 F0001

ID Fname Lname Email

C0001 Charles Sanchez C.Sanchez@hotmail...

ID Type Content

F001 opinion Darjeeling Special…

Comment:

Customer_Info:

Feedback:

• Mapping with STORED (Semistructured TO RElational Data)– Basic idea: Use data mining techniques on the XML structure to find a good

mapping to tables [DFS99]

– Input• XML documents (or an average sample of the collection)

• Query workload

• Restrictions of storage space, number of tables, …

• No DTD or XML schema is required!

– Output• Relational schema

• STORED-queries: Mapping instructions for XML documents to DB tables

– Procedure• Determine the XML subtrees with the largest support in the collection and in the

queries

• These subtrees are materialised in tables

• Irregular data is stored in overflow tables according to the EDGE approach

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 32

10.4 Schema-based storage

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

• Mapping with STORED – example

10.4 Structure-based storage

XML documents shown as tree structure

Subtrees with

high support

Subtrees with

high support

33[Tür08]

• Mapping to object relational databases– DTD is usually required

– Queries use SQL functionality

– "Natural" mapping to tupletypes, collection types

– In case of irregular document structure databases containmany null values.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 34

10.4 Schema-based storage

Comment_ID <Customer_info> <Feedback>

44901

Fname Lname Email

Charles Sanchez C.Sanchez@hotmail...

Type Content

opinion Darjeeling Specia…

Comment:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 35[Tür08]

10.4 Schema-based storage

• Mapping of recursive data definitions– DTDs can be recursive

– Infinite recursion is impossible on instance layer of a database

– Procedure:• Marking the nodes

• Subdividing into separate tables

• Use primary and foreign keys in RDBMS

• Use reference types in ORDBMS

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 36

10.4 Schema-based storage

<!ELEMENT book (front, body, references)>

<!ELEMENT references (book+)>

Page 7: 10. XML Storage 1 XML Databases 10 . XML Storage 1 – Overvie · • Indexes atomar values of an XML document, like element content or attribute values • Index format for structured

• Mapping of element sequences

– Sequence can be important

• Use an additional attribute in these cases

– Example:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 37

10.4 Schema-based storage

Order Lesson

1 Introduction

2 XML basics

<lecture>

<lesson>Introduction</lesson>

<lesson>XML basics</lesson>

⇓⇓⇓⇓⇓⇓⇓⇓

• Mapping of alternatives

– XML allows to specify alternatives

– Example:

– Three possible storage variants

• Each alternative is stored as separate table column

• Subdivide alternatives in separate tables

• Use a table column of type XML type

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 38

10.4 Schema-based storage

<!ELEMENT car (compactCar | sedan | van)*>

• Variant 1 – all alternatives in one table

– Problem: many null values (wasting storage space)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 39[Tür08]

10.4 Schema-based storage

• Variant 2 – subdivided into multiple tables

– For queries, combination of tables is needed

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 40[Tür08]

10.4 Schema-based storage

• Variant 3 – Using column type XML

– XML type allows XML queries or DOM methods

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 41[Tür08]

10.4 Schema-based storage

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 42[Tür08]

10.4 Schema-based storage

Mapping of mixed content – example

Page 8: 10. XML Storage 1 XML Databases 10 . XML Storage 1 – Overvie · • Indexes atomar values of an XML document, like element content or attribute values • Index format for structured

• Mapping of mixed content

– Mapping to plain tables is ill-suited

– Use variant 3 from above or

• Content model ANY is not representable at all

– Arbitrary content, arbitrary element types

– Often the fitting storage structure can only bedecided on instance layer

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 43

10.4 Schema-based storage

• Schema-based storage with automaticmapping

– Advantages

• Queries, data types, aggregation functions, views

• Integration in other databases when storing structured data

– Disadvantages

• Large schema, sparsely filled databases (many null values)

• No flexible data types, storage of alternatives has problems

• Less flexible queries

– No information retrieval queries possible without additional extensions

– No full text operations for semi- or unstructured data

– Usually native access is not possible any more

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 44

10.4 Schema-based storage

• Mapping solutions with different specializations

– Algorithms, middleware, commercial applications, …

– Varying amount of required input or user decisions

– Many algorithms create different database schemas

• Two phases

– Mapping

• Assign a place for each node type in the DB

– Shredding

• Import the XML data as DB tuples

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 45

10.4 Schema-based storage

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 46[Bus08]

10.4 Schema-based storageAlgorithm/product |based on: n/a DTD schema |restrictions: keys cardin. types | DTD optimisation

• The shredder can be part of the DB

– Usually requires an XML schema

– In the IBM Data Studio, the shredder is part of the"annotated XML schema decomposition"

– Direct approach in DB2:

• register the XML schema and call the stored procedure:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 47

10.4 Schema-based storage

register xmlschema http://our.org/custacc from

dec_files/custacc.xsd as cust_schema ;

complete xmlschema cust_schema enable decomposition ;

call SYSPROC.XDBDECOMPXML ('VRODRIG', 'CUST_SCHEMA', ? ,

?, 1, null, null, null)

• Shredding without XML schema in DB2

– XMLTABLE function in combination with an INSERT

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 48http://www.ibm.com/developerworks/db2/l

ibrary/techarticle/dm-0801ledezma/

10.4 Schema-based storage

INSERT INTO ENVELOPEXT (MAILFROM, MAILTO, MAILDATE, SUBJECT)

SELECT MAILFROM, MAILTO, MAILDATE, SUBJECT

FROM XMLTABLE(

XMLNAMESPACES('http://www.sal.com/mails' AS "email"),

'$doc/email:mails/mail' (: some xquery-expression :)

PASSING xml-source AS "doc"

COLUMNS

MAILFROM VARCHAR (100) PATH 'envelope/from',

MAILTO VARCHAR (100) PATH 'envelope/to',

MAILDATE VARCHAR (30) PATH 'envelope/email:Date',

SUBJECT VARCHAR (100) PATH 'envelope/Subject') AS T;

Page 9: 10. XML Storage 1 XML Databases 10 . XML Storage 1 – Overvie · • Indexes atomar values of an XML document, like element content or attribute values • Index format for structured

• Summary Schema-based storage with automatic mapping– Schema definition:

• Is usually required and analysed

• not required, e.g. for STORED

– Document reconstruction:• Limited (requires logging of the mapping process)

– Queries:• Database queries

• XML queries possible,but lack the XPath horizontal axes, e.g. following, preceding-sibling

– Special features:• Federation with existing databases is possible

– Efficiency:• High efficiency by using the DB-engine

– Usage: • For data-centric XML applications, but with limited nesting

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 49

10.4 Schema-based storage

• User defined mapping– Idea

• In all previously shown methods it is not possible to affect the storage in the DB

• With user defined mappings the user defines the storage structure

• The structure of XML documents and database schema can be designedindependently from each other

• Also possible: storing XML documents in existing databases

– Annotation of DTD and XML schema, respectively• In many cases the mapping definition is combined with existing schema

information

– Only limited XML queries possible• Logging of the mapping process from XML documents to databases

• For a given query all relevant data has to be stored (lossless mapping)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 50

10.4 Schema-based storage

• Example:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 51[Tür08]

10.4 Schema-based storage

mapping instructionXML document

• Mapping instruction

– Example syntax for XML-DBMS (Roland Bourret)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 52

10.4 Schema-based storage

<ClassMap>

<ElementType Name="sales:SalesOrder"/>

<ToClassTable>

<Table Name="Sales"/>

</ToClassTable>

<PropertyMap>

<Attribute Name="SONumber"/>

<ToColumn>

<Column Name="Number"/>

</ToColumn>

</PropertyMap>

</ClassMap>

Connection

between elements

and tables

Connection

between

elements/attributes

and table columns

• Remarks

– Many different mapping languages or schemaannotations

• Automatic mappings usually have an internal mappinglanguage

– Remember the mapping constructs from lecture 5 and6. The SQL/XML annotations are a mapping language, too.

– DB2 uses similar annotations as SQL/XML

• On the next slide, the example from lecture 6 is shown withDB2 syntax

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 53

10.4 Schema-based storage

Name Balance

Joe 2000

Jim 3500

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 54[Tür08]

<ACCOUNT>

<row>

<NAME>Joe</NAME>

<BALANCE>2000</BALANCE>

</row>

<row>

<NAME>Jim</NAME>

<BALANCE>3500</BALANCE>

</row>

</ACCOUNT>

Mapping SQL tables <xsd:complexType xmlns:db2-xdb=

"http://www.ibm.com/xmlns/prod/db2/xdb1"

name="ROW.ACCOUNT">

<xsd:sequence>

<xsd:element name="NAME"

type="CHAR_20"

db2-xdb:rowSet="Account"

db2-xdb:column="Name"/>

<xsd:element name="BALANCE"

type="NUMERIC_12_2"/>

db2-xdb:rowSet="Account"

db2-xdb:column="Balance"/>

</xsd:sequence>

</xsd:complexType>

<xsd:complexType name="TABLE.ACCOUNT">

<xsd:sequence>

<xsd:element name="row"

type="ROW.ACCOUNT"/>

</xsd:sequence>

</xsd:complexType>

<xsd:element name="ACCOUNT"

type="TABLE.ACCOUNT"/>

CREATE TABLE Account

(

Name CHAR(20),

Balance NUMERIC(12,2),

);

Mapping SQL

table columns to

XML elements

Mapping table

rows to XML

<row>

elements

SQL/XML

schema

annotations in

DB2

(table is called

rowSet)

Page 10: 10. XML Storage 1 XML Databases 10 . XML Storage 1 – Overvie · • Indexes atomar values of an XML document, like element content or attribute values • Index format for structured

• Summary schema-based storage with user definedmapping– Schema definition:

• Depends on mapping language

– Document reconstruction:• Not possible in most cases (requires logging of the mapping process)

– Queries:• Database queries• XML queries in rare cases only!

– Special features:• Integration with existing databases is possible

– Efficiency:• High efficiency by using the DB-engine

– Usage: • For data-centric XML applications

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 55

10.4 Schema-based storage

10.1 Motivation

10.2 Text-based storage

10.2.1 Index structures

10.3 Model-based storage

10.4 Schema-based storage

10.5 Conclusion

10.6 Overview and References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 56

10. XML Storage 1

• Different methods for storage of XML documents– Text-based

• Storing whole XML documents as string• Can use full text index or path index

– Model-based• Generic mapping of the tree structure

– Schema-based• Detect and analyse the structure of the XML documents• Derive a DB schema from the structure

– Hybrid approaches• A combination of some of those methods

– No algorithm has the optimal solution for all kind of XML documents

– Reasonable solution is heavily dependent on the application

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 57

10.5 Conclusion

• "XML und Datenbanken" [Tür08]– Can Türker– Lecture, University of Zurich, 2008

• "XML und Datenbanken" [KM03]– M. Klettke, H. Meier– dpunkt.verlag, 2003

• "Generierung eines adaptiven Datenbankschemas für datenzentrierte XML-Dokumente" [Bus08]– Carsten Busche– Diplomarbeit, TU Braunschweig, 2008

• [FK99]– D. Florescu, D. Kossmann: Storing and Querying XML Data using an RDBMS. IEEE Data

engineering Bulletin (DEBU), Volume 22(3), Seiten 27-34, 1999.

• [DFS99]– A. Deutsch, M.F. Fernández, D. Suciu: Storing Semistructured Data with STORED.

Proceedings of the 1999 ACM SIGMOD international conference on Management of data, Seiten 431-442, ACM, 1999.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 58

10.6 References

1. Introduction

2. XML Basics

3. Schema definition

4. XML query languages I

5. Mapping relational datato XML

6. SQL/XML

7. XML processing

8. XML query languages II –XQuery Data Model

9. XML query languages III – XQuery

10. XML storage I –Overview

11.XML storage II

12. Updates / Transactions

13. Systems

10.6 Overview

59XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

üüüü

üüüü

üüüü

üü

üü

üü

üü

• Now, or ...

• Room: IZ 232

• Office our: Tuesday, 12:30 – 13:30 Uhr

or on appointment

• Email: [email protected]

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 60

Questions, Ideas, Comments