stammtisch datenbankbasierte verarbeitung von xml

22
© 2006 AG DBIS Datenbank- stammtisch 2006 Datenbankbasierte Verarbeitung von XML-Dokumenten ist anders! Datenbankbasierte Verarbeitung von XML-Dokumenten ist anders! Theo Härder www.haerder.de Modellierungs- und Verarbeitungskonzepte von RDBMS Architektur- und Speicherungskonzepte in der XML-Welt Nummerierungsverfahren für XML-Bäume Indexierung von XML-Dokumenten Algorithmen für die Pfadverarbeitung Sperrverfahren in XML-Bäumen © 2006 AG DBIS 2 Datenbank- stammtisch 2006 Path Processing Algorithms Labeling of XML Tree Nodes Flexibility and Evolution XML Indexing Conventional Models & DBMS Concurrency Control Summary XML DBMS - the Big Picture Bird‘s Eye View to Conventional Database Management Bird‘s Eye View to Conventional Database Management data types file systems relational object oriented object relational simple sequential processing, . . . CAD, . . . CAD, SE, GIS, . . . Multimedia, . . . (R)OLAP, business appl. queries complex complex system types OODBS file systems video server Universal Server complex simple simple complex queries data types trends n*10 8 $ k*10 10 $ and market dominance

Upload: others

Post on 08-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS

Datenbank-stammtisch

2006Datenbankbasierte Verarbeitung von

XML-Dokumenten ist anders! Datenbankbasierte Verarbeitung von

XML-Dokumenten ist anders!

Theo Härderwww.haerder.de

Modellierungs- und Verarbeitungskonzepte von RDBMS

Architektur- und Speicherungskonzepte in der XML-Welt

Nummerierungsverfahren für XML-Bäume

Indexierung von XML-Dokumenten

Algorithmen für die Pfadverarbeitung

Sperrverfahren in XML-Bäumen

© 2006 AG DBIS 2

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Bird‘s Eye View to Conventional Database ManagementBird‘s Eye View to Conventional Database Management

d at a

type

s

filesystems relational

object oriented object relational

simple

sequentialprocessing,. . .

CAD,. . .CAD, SE, GIS,

. . .Multimedia,

. . .(R)OLAP,business appl.

queries complex

complexsystemtypes

OODBS

file systemsvideo server

UniversalServer

complex

simple

simple complexqueries

datatypes

trends

n*108 $

k*1010 $

and marketdominance

Page 2: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 3

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Conventional Data ModelsConventional Data ModelsSupport of structured data• separation of schema (structure information) and data

DB schema• complete description of structure (structural meta-data)• has to be specified before storing data objects• needed to interpret and manipulate the data

Data• always are instances of the schema

- structure is fixed, no flexibility/deviation possible

• do not carry structure information - must be interpreted and manipulated by means of the schema

AgeAddressName

Employee

35Opernplatz 5, …Schmidt

20Badstraße 3, …Maier

55Schlossallee 1, …Müller

© 2006 AG DBIS 4

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

RDBMS Architecture – Mapping HierarchyRDBMS Architecture – Mapping Hierarchy

tables/views with set-oriented operations

T2T1

data system

variety of record types and access paths with record-orientedoperations

R1 …R2 Rn

access system

S1 S2 Sm

external storagewith filesof different types

P1

P2 P3

P17

DB buffer in memoryproviding page access

storage system

R1S2

R2

S1

P17

P1

P2

P3

Page 3: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 5

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Query Processing in SQL is well KnownQuery Processing in SQL is well Known

Declarative queries on tables/views

Optimized DBMS model and query at runtime

Q1: Select E.Name, E.Age, D.NameFrom Employee E, Department D Where E.Dno = D.Dno

And E.Name = “S*” And D.Loc = “Dresden”

Open Scan(ID(Loc), Loc=’Dresden’,Loc>’Dresden’) /*SCB1*/Sort Access (SCB1) ASC Dno Into T1 (Dno, …)Close Scan (SCB1)Open Scan (IE(Name), Name>=’S’, Name>’S’) /*SCB2*/Sort Access (SCB2) ASC Dno Into T2 (Dno, …)Close Scan (SCB2)Open Scan (T1, BOF, EOF) /*SCB3*/Open Scan (T2, BOF, EOF) /*SCB4*/While Not FinishedDo

Fetch Tuple (SCB3, Next, None)Fetch Tuple (SCB4, Next, None). . .

End

SQL query Q1

L4

L3

L2large

DB buffer

© 2006 AG DBIS 6

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Essential Mechanisms for Internal ProcessingEssential Mechanisms for Internal Processing

spec.: CREATE [UNIQUE] INDEX indexON base-table (column [ORDER] [,column [ORDER]] ...)[CLUSTER] [PCTFREE]

indexing and clustering are the performance-boosting mechanisms!conceptually simple operations: equality-, range-based selection, equi-joins, different scans, …

25 61

8 13 33 45 77 85

IDept(Dno)clustered index

1. Addressing rows of tables: TIDs (stable, but not order preserving!)

2. Value-based indexes:

unclustered index25 61

8 13 33 45 77 85

IEmp(Dno)

3. Multi-granular locking: database

segment

table index

row

locking along the organizational hierarchy- use of intention locks

- implicit locks for subtrees

------+X

---+-++U

-----++RIX

---+-++R

----+++IX

--+++++IR

XURIXRIXIR-

compatibility matrix

Page 4: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 7

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Why XML Database Systems (XDBMS)?Why XML Database Systems (XDBMS)?

Properties wanted:• instances carry structure and data (integrated meta-data)• modeling flexibility and schema evolution• DB-controlled or application-controlled consistency• in addition to the salient DBMS features

- ACID transactions- declarative query processing- efficient and parallel processing of large data volumes- high availability and fault tolerance- scalability w.r.t transaction workload and data volumes

Examples:• document-centric view: document collections

- books, articles, web pages, …→ application: structure-sensitive information retrieval

• data-centric view: semi-structured data model - messages, configuration files, semi-structured data per se→ application: healthcare information management

© 2006 AG DBIS 8

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Mapping Flexibility: XML Relational ModelMapping Flexibility: XML Relational ModelD

Employees

Schmidt

Employee

Name Address Age

Opern-platz 5, …

35

Maier

Employee

Name Address Age

Badstr. 3, … 20

Müller

Employee

Name Address Age

Schloss-allee 1, …

55

Perfect mapping to RM only, if- uniform objects (entities)- description exactly in three levels O/A/V- atomic values

- no aspects (attributes of XML elements)- order is insignificant.35Opernplatz 5, …Schmidt

20Badstraße 3, …Maier

55Schlossallee 1, …Müller

AgeAddressName

Employee

integrated meta-data

Page 5: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 9

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Mapping Flexibility: XML Relational Model (2)Mapping Flexibility: XML Relational Model (2)

Mapping to RM across several tables

- is at least very complex and

becomes quickly incomprehensible,

- additionally must preserve order,

- provides no solution for ‘aspects’

- is called “shredding”

180----Opernplatz 5, …Schmidt3

--201- -Maier2

--55--Schlossallee1, …Müller1

HeightAgeANoAddressNameCNoEmployee

KL67657F-W-Str. 521

KL67663Bachstr. 311

CityZIPStreetCNoANoAddress

D

Employees

180Schmidt

Employee

Name Address

Opern-platz 5, …

Müller

Employee

Name Address Age

Schloss-allee 1, …

55Employee

Maier

Name Age

20

Address

Badstr. 3

Street ZIP City

67663 KL

Address

F-W-Str. 5

Street ZIP City

67657 KL

Measured_at=“10.01.2006”

HeightUnit=“cm”

© 2006 AG DBIS 10

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Flexibility and Schema EvolutionFlexibility and Schema Evolution

Data + algorithms, but data first• DB people are a “strong schema” community

- that has pros and cons- views enable a certain kind of schema evolution

Opposite view: extreme schema evolution• “everyone starts with the same schema <stuff/>.”

then they refine it.” J. Widom

Simpler data integration• heterogeneous data sources and messages• file directories are becoming databases;

- pivot on any attribute- folders are standing queries- freetext + schema search (better precision/recall)

Jim Gray’s opinion• XSD (xml schema) and XQuery are transitional;

but we have to do them to get to the real answer• cohabit with row stores• challenge: figure out what comes after XSD+XQuery

Page 6: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 11

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

“Entire world” a single doc• heterogeneous subtrees• typical index-supported ops: XPath

processing, TWIGs, …• variety of language models

DOM, SAX, XPath, XQuery

Flexibility and schema evolution• schema-less approach (consistency

guarantees, search?)• therefore with multiple or evolving

schemas• or schemas including extensibility

points (Really Simple Syndication, Atom, …)

Collection of moderate sized (<1MB) docs• heterogeneous • filtering by index support• XQuery, XSQL, SQL/XML

DB-controlled consistency• XSD design first• schema-supported search• strong consistency checking

(hardly any serious work)• how to handle 105 schemas

Timberisolatedsingle docs

schema-lessdocuments

dataorganization

DB2 ViperMS SQLServer

. . .

XTC, Natix

105 XSDscollectionsof docs

schema-based

1 XSD

Bird‘s Eye View to XML Database ManagementBird‘s Eye View to XML Database Management

© 2006 AG DBIS 12

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Relational, Native, or Hybrid Storage?Relational, Native, or Hybrid Storage?

LOBs don’t enable fine-granular management, no content-based search and no multi-user operation

Mapping onto relational tables?• many solutions: “shredding”• XML query language (e.g., XQuery, XPath, DOM, SAX)

must be mapped to SQL• use of the SQL optimizer!• but: concurrency control (locking) very cumbersome,

because a document is distributed over n tables

Hybrid approach• example: Viper• SQL with XQuery subqueries

from SQL‘s functions xmlQuery, xmlExists, and xmlTable

• stand-alone XQuery interface- provides access to XML columns- db2-fn:xmlcolumn imports an

entire XML column as a sequence of items

DB2 Storage

. . .. . .. . .

<employee> …<name> … </name></employee>

. . .„V18“

xmldoc. . .ID

create table department (ID char(8), …, xmldoc xml)

Page 7: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 13

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Relational, Native, or Hybrid Storage? (2)Relational, Native, or Hybrid Storage? (2)

employee

id=450 name tel office id=426 name tel office

Tom Meier 0211-126812 119 Tina Lint 0211-679088 113

department

employee

<department><employee id=“450”>

<name> Tom Meier</name><tel>0211-126812</tel><office>119</office>

</employee><employee id=“426”>

<name>Tina Lint</name><tel>0211-679088</tel><office>113</office>

</employee></department>

Transformation into an internal XML tree

Native: conceptual XML mapping to a fine-grained storage structure

5=450 7 3 5=426 7

Tom Meier 0211-126812 458 Tina Lint 0211-679088 113

9

66

1 1 3

office3

tel7

id5

name1

employee6

department9

String table

SYSIBM:SYSXMLSTRINGS

Element names are replaced by means of a dictionary

© 2006 AG DBIS 14

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Native XMLStore

XQuery DBMS

XQuery XML

Tables

SQL DBMS

SQL Tuples

Approaches to Coexistence with SQLApproaches to Coexistence with SQL

XQuery Rewriter

XQuery XML

SQL Rewriter

SQL Tuples

• XOR: "XML over Relational"• "Shredding" XML -> Tables

• ROX: "Relational over XML"• "Native" XML storage• SQL Systems become legacy

Page 8: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 15

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

XML Transaction Coordinator (XTC)XML Transaction Coordinator (XTC)XML Transaction Coordinator (XTC)

OS File SystemTransaction Log Container Files Container Logs Temporary Files

XTCs

erve

r

Transaction Services

File Services

Propagation Control

Access Services

Node Processing Services

XML Services

Interface Services

XTCdriver

Http Agent Ftp Agent DOM RMI SAX RMI API RMI

XML Manager XSLT ProcessorXQuery Processor

Node Manager

Record Mgr Index Mgr Catalog Mgr

Buffer Manager

I/O Manager Temp File Mgr

Transaction Manager

Lock Manager

Deadlock Detector

DOM

SAXXTCconnection

Browser FTP Client

L1

L2

L3

L4

L5

Path Processing

Architectural overview- reuse of the layer model is possible, but needs substantial adjustments

and, in particular, new functionality in the higher layers - focus on key mechanisms for efficient internal processing

© 2006 AG DBIS 16

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Example of an XML DocumentExample of an XML Document

• <bib><book year=“1994“ id=“1“>

<title>TCP/IP Illustrated</title><author>

<last>Stevens</last><first>W.</first>

</author><price>65.95</price>

</book><book year=“2000“ id=“2“>

<title>Data on the Web</title><author>

<last>Abiteboul</last><first>Serge</first>

</author><author>

<last>Buneman</last><first>Peter</first>

</author><author>

<last>Suciu</last><first>Dan</first>

</author><price>39.95</price>

</book><book year=“1999“ id=“3“>

<title>The Economics of . . . </title><editor>

<last>Gerbarg</last><first>Darcy</first><affiliation>CITI</affiliation>

</editor><price>129.95</price>

</book></bib>

Page 9: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 17

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Example of a DOM Tree: Conceptual RepresentationExample of a DOM Tree: Conceptual Representation

bib

T text node

book

1994 1title author price

last firstyear id

T

TCP/IPT T

T

Stevens W.

69.95

element

attribute

book

T

T

title author price

last first

TT

last first

author

TT

last first

author2id

2000year

TAbitebul

TData …

SergeBuneman

Peter SucioDan

39,95

book

T

T

T

31999 title author price

lastfirst

T

affiliationyear id

TheEnd…

GerbargDarcy

CIII

129,95

T

© 2006 AG DBIS 18

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Node Labeling – Use of k-ary TreesNode Labeling – Use of k-ary Trees

Concept of virtual nodes

• parent (cn, k) = ceil ((cn – 1)/k)• child (cn, k) = cn*k – (k-1) + 1, cn*k – (k-2) + 1,

cn*k – (k-i) + 1, …, cn*k – 1 + 1, cn*k + 1 • ancestor (cn, k) = parent (cn, k), parent (parent (cn, k), k), …• descendant (cn, k) = child (cn, k), child (child (cn, k), k), …• sibling (cn, k) = child (parent (cn, k), k), …• previous/following …

KO criterion• any computed label may correspond to a virtual node• tree representation has to be accessed to check whether a node is

real or virtual

1

2 43

135 10

14 31 40

a document may have a very large k and very many levels

Page 10: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 19

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

DeweyIDs Embody a Special Prefix Labeling SchemeDeweyIDs Embody a Special Prefix Labeling Scheme

Labels must• be immutable for the lifetime of the nodes• preserve the document order, when inserting new nodes • easily reveal the level and the ID for all ancestor nodes

DeweyID consists of several divisions separated by dots• overflow mechanism: even division values

• level determination

• ancestor IDs: a0 = 1; a1 = 1.3; a2 = 1.3.17; a3 = 1.3.17.2.2.3

• ordering d2 ? d1

d1 = 1.3.17.2.2.3.4.9

d1 = 1.3.17.2.2.3.4.9

d2 = 1.3.17.2.3.7

d1 < d2 : 1.3.17.2.2.3.4.9 < 1.3.17.2.3.7

© 2006 AG DBIS 20

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Initial Assignment of DeweyIDs (SPLIDs)Initial Assignment of DeweyIDs (SPLIDs)

• assignment of division values is affected by parameter distance (= 4)

• on initial loading, only odd division values are assigned• odd division value indicates level transition

1

book

T

1.5

year

1.5.1.3

id

1.5.1.5

bib

book

1994 1

1.13book1.9

title author price

T

1.5.5 1.5.9 1.5.13

TPC/IP...

last first1.5.5.5 1.5.9.5 1.5.9.9 1.5.13.5

1.5.9.5.5 1.5.9.9.5 65.95

Stevens W.T T

T

elementattribute

text node

Page 11: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 21

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

book 1.2.9

DeweyIDs: Insertion of SubtreesDeweyIDs: Insertion of Subtrees

distance = 4

worst-case considerations:distance = 8

book 1.2.3 1.5

1

1.9

bib

bookbookbook 1.3book 1.2.5book 1.2.2.9

1

book

T

1.5

year

1.5.1.3

id

1.5.1.5

bib

book

1994 1

1.13book1.9

title author price

T

1.5.5 1.5.9

author

1.5.11 1.5.13

TPC/IP...

last first1.5.5.5 1.5.9.5 1.5.9.9 1.5.13.5

1.5.9.5.5 1.5.9.9.5 65.95

Stevens W.T T

type

1.5.3

author

1.5.12.5

© 2006 AG DBIS 22

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Characteristics of XML Documents ConsideredCharacteristics of XML Documents Considered

1.912,1124.376666,72911)uwm.xml

2.4150,0004.0762,189,8592,977,0314)SwissProt.xml

1.9015,0013.4241150,00110)orders.xml

1,762,4356.08956,317476,6462)nasa.xml

3.459554.15647,42322,4239) mondial-3.0.xml

1.9460,1763.45411,022,9768)lineitem.xml

1.90124.26601567)ebay.xml

1.891,5013.414113,5016)customer.xml

1.81262,5295.6881,290,64721,305,8183)psd7003.xml

1.5856,3858.443712,437,6661)treebank_e.xml

∅−fanoutof elements

max.fanout

∅−depth

max.depth

number of

attributes

numberof element

nodessize

(bytes)

Courses of a University Website

DB of protein sequences

Orders from TPC-H Benchmark

Astronomical data

Geographical DB of diverse sources

Line items fromTPC-H benchmark

Ebayauction data

Customers fromTPC-H benchmark

DB of protein sequences

Encoded DB of English records of Wall Street Journal

descriptionfile name

5)dblp.xml 2.11649,0803.3971,375,8326,662,623

2,337,522

114,820,211

5,378,845

25,050,288

1,784,825

32,295,475

35,562

515,660

716,853,016

86,082,517

284,994,162Computer Science Index

numberof text nodes

40,234

2,013,844

135,000

303,676

7,467

962,800

107

12,000

15,955,109

1,391,845

6,013,355

Page 12: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 23

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Encoding of DeweyIDsEncoding of DeweyIDs

17,895,768-2,165,379,4143111111

1,118,552-17,895,7672411110

69,976-1,118,5512011101

4,440-69,9751611100

344-4,439121101

88-34381100

24-876101

8-234100

1-730

value range of OiLiHuffman

code

Optimization and adaptivity potential- cut prefix 1.- apply prefix compression to DeweyIDs- analysis phase, if possible: determine DOM tree parameters

for optimized Huffman code assignment (even level-wise applicable)- adjust Huffman code to the size and anticipated modification

operations of each tree

range

expansion

© 2006 AG DBIS 24

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

DeweyIDs – Comparison of Avg. Sizes to Max. SizesDeweyIDs – Comparison of Avg. Sizes to Max. Sizes

6

10

11

13

13

46

dist(256)

max-size

746.195.043.176. customer

1377.166.124.585. dblp

1388.147.045.104. SwizzProt

17811.308.845.613. psd7003

18811.308.545.192. nasa

722215.9411.576.671. tree-bank_e

dist(256)dist(2)dist(256)dist(32)dist(2)

∅-sizeDocument

Page 13: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 25

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

XTC Document IndexXTC Document Index

prefix compression works!

document index is a B-tree for the document(s) stored in the doubly-chained pages of the document containertext values exceeding a given threshold are stored in referenced mode

1.3.1.31.3.11.31

1.3.31.3.1.5.11.31.51.3.1.3.1

1.3.5.31.3.51.3.3.3.11.3.3.3

1.3.5.5.31.3.5.51.3.5.3.3.11.3.5.3.3

1.3.7.3.11.3.7.31.3.71.3.5.5.3.1

DeweyID node data (byte representation)

1.3.1.3.1

1 1.3.5.3.3

1.3.5.5.3.1

1.3.3.3

docu

men

tind

exdo

cum

entc

onta

iner

© 2006 AG DBIS 26

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

XML Indexing MethodsXML Indexing Methods

Path indexing• effective support for

- query rewriting- query pruning based on structural summaries,

e.g., to check the existence of paths

• examples- DataGuide, 1-Index, APEX, D(k)-Index, A(k)-Index

Many types of path fragments in queries to be supported• parent/child: a/b/c

• ancestor/descendants: a//x//y

• enhanced by predicates: a/b[pred1]//x[pred2]

• twigs: a/b[.//c]//x[d]

What is the ubiquitous index – the silver bullet – for XML queries?

use of DeweyIDsin element indexeshelp a lot!

path processing opsoften needed

Page 14: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 27

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Genealogy of XML Indexes not yet KnownGenealogy of XML Indexes not yet KnownWhat is the “B*-tree” for XML queries?

Design of XML indexes ahead of us

© 2006 AG DBIS 28

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Data Guide – Supporting Query Optimization and EvaluationData Guide – Supporting Query Optimization and Evaluation

bib {1}

book {1.5, 1.9.1.13, …}

price{1.5.13, …}

authortitleidyear

firstlast

{1.5.9, 1.5.11, …}{1.5.5, 1.9.5, …}{1.5.1.5, …}{1.5.1.3, …}

{1.5.9.9.5, …}{1.5.9.5.5, …}

A data guide d has for every label path in the source document s a path class in d and every path class of d occurs as one or more label paths of s

A data guide provides a structural summaryXTC stores the DeweyIDs in separate index structures

DOM tree example

T

T

title author price

last first

TT

last first

book

author

TT

last first

author2id

2000year

TAbitebul

TData …

SergeBuneman

Peter SucioDan

39,95

bib

Page 15: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 29

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

XTC Element IndexXTC Element Index

author

book

price

1.3.5 1.3 1.3.7

each of them sorted in document order

name directory(B-tree)

node-reference indexes(B*-trees)

In combination with a data guide, the XTC element index can be used like a path index- unique occurrence of element names: DeweyIDs are path indexes- homonyms of element names: additional tests allow the separation of the paths(DeweyIDs)

© 2006 AG DBIS 30

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

XML Indexing Methods (2)XML Indexing Methods (2)

Hybrid Indexing• goal: evaluation of queries on structure and content

IndexFabric as an example (root to leaf)• designators: special characters for encoding paths• a unique designator is assigned to each tag

- bib = bi, type = ty, author = a, last = l- book = bo, title = ti, price = p, first = f- example: “bibotiTCP/IP”

• all strings are stored in a (layered) PATRICIA trie

4

5 5 5

6666

t

l

p

i f

a

j…

………

designator

bibotiTCP/IPDOCID. . .

bibotiXMLDOCID. . .

normalchar T X

bibo

query evaluation: translate query to designator string and lookup in the trie

Page 16: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 31

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

XML Indexing Methods (3)XML Indexing Methods (3)

Content or text indexing (known from IR)• goals:

- support of predicates on textual content of a document (contains, near, matches, …)

- indexing of (typed) values such as Integer, Double, … • examples: inverted lists, value index (VIndex in Lore)

2561

813

3345

7785

Potential indexes D

Emps

Schmidt

Emp

Name Address Age

Opern-platz 5, …

35

Maier

Emp

Name Address Age

Badstr. 3, … 20

Müller

Emp

Name Address Age

Schloss-allee 1, …

55

Query predicate must match index predicate: Q (/Emps/Emp/Age=20) could not be used with I (//Age)

Use of DeweyIDs would allow to filter out false positives!

I (/Emps/Emp/Age)

I (//Age)I (//Emp/Age)

I (//[Integer])

leaf to root!

© 2006 AG DBIS 32

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Evaluation Example – FLWOR ExpressionEvaluation Example – FLWOR Expression

1) $p iterates over all “Employee” elements and bind the children “Name” and “Age” to $n resp. $a.

2) Each iteration step checks whether “where” clause matches

3) Matches are stored in a sequence and sorted according to $n (Names)

4) Output is generated

output

youngEmp

Name Age

„Maier“ „20“

. . . youngEmp

Name Age

„. . .“ „. . .“

no matchno match match

„Müller“

D

Name Employees Name Employees

Enterprise

„Controlling“

DepartmentDepartment

WorksFor = „Maier“

„Personal“

Oid = „Maier“

„Badstr.“ „20“ „17.000“„Maier“

Name Age

Employee

Address Salary

Oid = „Müller“

„Schloss-allee“

„55“ „35.000“

Name Age

Employee

Address Salary

Oid = „Schmidt“

„Opern-platz“

„35“ „40.000“„Schmidt“

Name Age

Employee

Address Salary

$n1

$p1

$a1

$p2

$n2$a2

$p3

$a3$n3

. . .

for $p in //Employeelet $n := $p/Name, $a := $p/Agewhere $a < 25order by $nreturn <youngEmp>

{$n, $a } </youngEmp>

Page 17: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 33

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

XQuery – Path Processing StepsXQuery – Path Processing Steps

Problem: XQuery is really complex• order-preserving joins, implicit grouping,

result construction …Example fragment

doc(“sample.xml”)//autor[count

(parent::buch/preceding-sibling::element())>3]/vname[../nname = “Adams”]

concentration on a sequence of path processing steps

Axis::name_test

evaluation order: bottom-up, top-down, starting in the middle

algorithms: selective access via DeweyIDs, locking-aware

© 2006 AG DBIS 34

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Child AxisChild Axis

n

nn

n n n

1

1.3

1.3.3 1.3.5 1.3.7 1.3.9

1.3.5.3 1.3.5.5 1.3.9.3

1.3.5.3.3 1.3.5.3.5 1.3.9.3.3 1.3.9.3.5 1.3.9.3.7

input node

node satisfying name test n

insertion of the input sequence into a hash table

HT(1.3)HT(1.3.5.3)HT(1.3.9.3)

element index access for name

HT(1.3) 1.3.3HT(1.3.5.3) 1.3.5.3.5HT(1.3.9.3) 1.3.5.5

1.3.71.3.9.3.31.3.9.3.7

„probing“ of the parent DeweyIDs

HT(1.3) 1.3.3HT(1.3.5.3) 1.3.5.3.5HT(1.3.9.3) 1.3.5.5

1.3.71.3.9.3.31.3.9.3.7

result

1.3.31.3.5.3.51.3.71.3.9.3.31.3.9.3.7

input sequence: dark nodes observe level information

1.3.5.5

Page 18: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 35

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Path Processing AlgorithmsPath Processing Algorithms

axis

ope

rato

rs

ancestor

descend.

middletop-down

bottom-up

12345678

2 3 4 5 6 7 8 9 10 12merge hash index

stack-tree

node-at-a-time

node-at-a-time

3

navigational set-orientedsequence-at-a-time navigational set-oriented

binary structural joins holistic twig joins(branching path queries)

use of DeweyIDs

parent

child

.

.

.

evaluationsequence

x

and

y z

a

c

d e

b

© 2006 AG DBIS 36

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

taDOM Storage Model – View of Lock MgrtaDOM Storage Model – View of Lock Mgr

XML document<?xml version="1.0"?><bib><book year="2004" id="book1"><title>The Title</title><author><last>Lastname</last><first>Firstname</first>

</author><price>49.99</price>

</book></bib>

string node

T

bib

book

title author price

id year Tlast first

TTThe Title

Lastname Firstname

49.99book1 2004

attribute root node

element node

attribute nodetext node

• NavigationgetFirst/LastChildgetNextSiblinggetPreviousSiblinggetAttributes/Value

• ModificationappendChildinsertBeforeremoveChild

• QuerygetElementByIdgetChildNodes

DOM API (> 20 ops)

Page 19: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 37

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Tailored Node Locks for XML – taDOM2Tailored Node Locks for XML – taDOM2

Node locks and compatibility matrix• refined URIX protocol with extensions to lock a complete level in a subtree

- well known: IR/IX and R/X (here SR/SX)

• edge locks not discussed (3 modes)

intention read lock on a nodeIR

read lock on an entire subtreeSR (subtree read)

read lock on a context node and all direct-child nodes

LR (level read)

locks only a context nodeNR (node read)

effectlock

indicates existence of a write lock on a direct child node

CX (child excl.)

write lock on an entire subtreeX (exclusive)

intention of a write lock on a non-direct child node

IX (intent. excl.)

read lock for intended update operations on an entire subtree

SU (update option)

effectlock

--+++++++IR

+

+

+

+

+

+

+

-

--------SX

----++++SU

--++--++CX

--++-+++IX

----++++SR

---+++++LR

--++++++NR

XUCXIXSRLRNRIR

Read locks

Write locks

Compatibility matrix

© 2006 AG DBIS 38

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Node Locks (1)Node Locks (1)

Node read lock (NR)• requires IR locks on the ancestor path

Level read lock (LR)• requested for reading the context node and all nodes

located at the level below (all direct-child nodes)Child exclusive lock (CX)• indicates an X lock on a child• defined, in addition to IX, to detect conflicts with LR

T

bib

book

title author price

Tlast first

TTThe Title

Lastname Firstname

49.99

Transaction T1 is reading <price>IR

IR

NR

Transaction T2 is reading <book>and all direct-child nodes(<title>, <author>, and <price>)

LR

IR

Transaction T3 is modifyingthe book title

IX

IX

IX

CXX

Page 20: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 39

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Node Locks (2)Node Locks (2)

Locking subtrees exclusively: intention exclusive lock (IX), child exclusive lock (CX), and exclusive lock (X)• requested for updating the context node's content or deleting

the context node and its entire subtree• requires a CX lock on the parent and IX locks on the ancestors

T

bib

book

title author price

Tlast first

TTThe Title

Lastname Firstname

49.99

Transaction T1 is deleting the<first> node and its content

X

CX

IX

IXTransaction T2 is deleting the<last> node and its content

X

CX

IX

IX

Transaction T4 is reading alldirect-child nodes of <book>

LR

IR

but is blocked when readingall child nodes of <author>

LR

Transaction T3 is reading the<author> node

IR

IR

NR

© 2006 AG DBIS 40

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

taDOM* Group – Lock Protocol OptimizationtaDOM* Group – Lock Protocol Optimization

Optimization steps• locking nodes without accessing the document • taDOM2+: LRIX, SRIX, LRCX, SRCX• taDOM3, taDOM3+

Example: lock conversion in taDOM3

SXSXSXSXSXSXSXSXSXSXSXSXSXSX

SXSUSXSUSXSXSXSXSUSUSUSUSUSU

SXSXNXNXNXNXNXNXNXSRNXNRNXNXNXNX

SXSUNXNUNXNXNXNXNUSRNUNRNUNUNUNU

SXSXNXNXNRCXNRCXNRCXNRCXNRCXSRNRCXNRNRCXNRCXNRCXNRCX

SXSXNXNXNRCXCXNRCXCXNRCXSRNRCXNRNRCXCXCXCX

SXSXNXNXNRCXNRCXNRIXNRIXNRIXSRNRIXNRNRIXNRIXNRIXNRIX

SXSXNXNXNRCXCXNRIXIXNRIXSRNRIXNRNRIXIXIXIX

SXSRNXSRNUSRNRCXSRNRCXSRNRIXSRNRIXSRSRSRSRSRSRSR

SXSUNXNRNUNRNRCXNRNRCXNRNRIXNRNRIXNRSRLRLRLRLRLR

SXSUNXNRNRCXNRCXNRIXNRIXSRLRNRNRNRNR

SXSUNXNUNRCXCXNRIXIXSRLRNRIRIRIR

SXSUNXNUNRCXCXNRIXIXSRLRNRIR-

Page 21: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 41

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Data base (DataGuide)

Size• ~8 MB • 580,000 taDOM nodes

Bank

Customers Accounts

Customer

id

Address

Street No Zip City

Name

Fname Name

Account

id OwnerBalance Credit

Standing_Orders

Protocols Postings

Protocol Posting

Standing_Order

Day Receiver AccountNo ABA_No Amount Disposition

Performance Measurement Performance Measurement

© 2006 AG DBIS 42

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Performance Measurement (2)Performance Measurement (2)Number of committed transactions in Banking benchmark

100

150

200

250

300

350

400

0 1 2 3 4 5

taDOM3+, taDOM2+

taDOM3, taDOM2

URIX

RIX(+), IRIX(+)

Node2PL, NO2PL, OO2PL

lock protocol taDOM3+taDOM3taDOM2+taDOM2URIXIRIXIRIX+RIXRIX+OO2PLNO2PLNode2PL

lock depth

Page 22: stammtisch Datenbankbasierte Verarbeitung von XML

© 2006 AG DBIS 43

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Number of aborted transactions in Banking benchmark

0

100

200

300

400

500

600

700

0 1 2 3 4 5

RIX(+), IRIX(+)

taDOM, URIX

Node2PL, NO2PL, OO2PL

lock protocol taDOM3+taDOM3taDOM2+taDOM2URIXIRIXIRIX+RIXRIX+OO2PLNO2PLNode2PL

Performance Measurement (3)Performance Measurement (3)

lock depth

© 2006 AG DBIS 44

Datenbank-stammtisch

2006

Path ProcessingAlgorithms

Path ProcessingAlgorithms

Labeling ofXML Tree Nodes

Labeling ofXML Tree Nodes

Flexibility andEvolution

Flexibility andEvolution

XMLIndexing

XMLIndexing

ConventionalModels & DBMSConventional

Models & DBMS

ConcurrencyControl

ConcurrencyControl

SummarySummary

XML DBMS -the Big PictureXML DBMS -

the Big Picture

Conclusions and OutlookConclusions and OutlookXTC is used as a test vehicle for empirical DB research• DeweyIDs are the concept for efficient internal XML processing

• index structures and a toolbox with PPOs greatly support XML query evaluation

• effectiveness of XML concurrency control- fine-granular locking on nodes and edges- taDOM* protocols

multiplicity of lock modesintention locks are importantindexed document access is frequentancestor path locking without accessing the storage engine desirable

• performance evaluation revealed- use of tailored lock modes pays off- influence of node numbering schemes (insertions at arbitrary positions)

Outlook• work on logical XML algebra and selectivity estimation for flexible

query optimization• adaptivity aspects in layered DBMS architectures