xpathlog - institut f¼r informatik

22
XPathLog: A Declarative, Native XML Data Manipulation Language Wolfgang May Institut f ¨ ur Informatik Universit ¨ at Freiburg Germany IDEAS Grenoble, 16.7.2001

Upload: others

Post on 09-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

XPathLog: A Declarative, Native XMLData Manipulation Language

Wolfgang MayInstitut fur InformatikUniversitat Freiburg

Germanymay�informatik�uni�freiburg�de

IDEASGrenoble, 16.7.2001

XPathLog

EXAMPLE: MONDIAL

�mondial��country car code=“B” capital=“cty-Brussels”

memberships=“org-eu org-nato . . . ”��name�Belgium�/name��population�10170241�/population��city id=“cty-Brussels” country=“B”��name�Belgium�/name��population year=“95”�951580�/population�

�/city�

:�/country�

�country car code=“D” capital=“cty-Berlin”memberships=“org-eu org-nato . . . ”�

:�/country�

�organization id=“org-eu” seat=“cty-Brussels”��name�European Union�/name� �abbrev�EU�/abbrev��members type=“member” country=“GR F E A D I B L . . . ”/��members type=“membership applicant” country=“AL CZ . . . ”/��/organization�

�organization id=“org-nato” seat=“cty-Brussels” . . .�

:�/organization�

:�/mondial�

1

EXAMPLE: MONDIAL XML TREE

document(“mondial.xml”)

mondialmondial

country

belgium

@car code=“B”

@memberships=“org-eu org-natoeu”@capital=“city-brussels”

country

germany

@car code=“D”

@memberships=“org-eu org-nato”

name

b-name

population

b-popcity

brussels@country=“B”@id=“city-brussels”

“Belgium” 10170241name

brus-namepopulationbrus-pop @year=1995

“Brussels” 951580

organization

eu @seat=“city-brussels”@id=“org-eu”

organization

nato @seat=“city-brussels”@id=“org-nato”

name

eu-name

abbrev

eu-abbrev

member

eu-mem1 @type=“member”@country=“B D”

member

eu-mem2 @type=“mem.appl.”@country=“CZ . . . ”

“Europ.Union” “EU”

XPathLog

LANGUAGES FOR XML

� two proposals of querying languages:XML-QL, W3C

� W3C XML Query Requirements/Data Model/Algebra

� no XML update language“Updating XML” @ SIGMOD 2001

Selection

� Navigation: XPath, XQuery

� XML Patterns: XML-QL

Generation

� instantiating XML patterns: XML-QL and XQuery

Languages

� Why not use XPath for updating and generating?

3

XPathLog

DESIGN DECISIONS

� experiences with F-Logic for semi-structured data and dataintegration

� declarative rule-based language with bottom-up semantics

� extend XPath with variable bindings

� graph data model, overlapping trees, multiple parents

4

XPathLog

XPATHLOG: SYNTAX

Extends the XPath syntax

XPathLog reference expressions are XPath location paths

��� ReferenceExpr ��� AbsLocPath

j ConstLocPath

��b� ConstLocPath ��� constant ��� RelLocPath

j variable ��� RelLocPath

Extend LocationSteps

�� Step ��� AxisSpec NodeTest Pred

j AxisSpec NodeTest Pred ���� Var Pred

j AxisSpec Var Pred

j AxisSpec Var Pred ���� Var Pred

� navigation by dereferencing IDREF attributes.

� Predicates over reference expressions

XPathLog 5

XPathLog

XPATHLOG BY EXAMPLES

Pure XPath expressions

?- //country[name/text() = “Belgium”]//city/name/text().

true

Output Result Set

?- //country[name/text() = “Belgium”]//city/name/text()�N.

N/“Brussels”

:

Additional Variables

?- //country[name/text()�N1 and

@car code�C]//city/name/text()�N2.

N2/“Brussels” C/“B” N1/“Belgium”

:

Local Variables

?- //country[name/text()�N1]//city[population/text()� P]

/name/text()�N2,

P � 100000.

Dereferencing

?- //organization[@seat = members/@country/@capital]

/@seat/name/text()�N.

XPathLog 6

XPathLog

XPATHLOG BY EXAMPLES

Metadata: Tag Variables and Schema querying

Navigation Variables

?- //Type�X[name/text()�“Monaco”].

Type/country X/country-monaco

Type/city X/city-monaco

Schema Querying

?- //city/N.

N/name

N/population

:

XPathLog 7

XPathLog

DATA MODEL: XTREEGRAPH

� extends DOM/XML Query Data Model

� Tailored to data integration

� Graph “database”

– XML element nodes: vertices

– subelement relationship,

– text children are leaf literals

– XML attribute nodes: literal values

– multivalued attributes (NMTOKENS, IDREFS) are split

– reference attributes (IDREF) are resolved

� element may be subelement of several vertices

� multiple trees, overlapping,no global “document” order

� (names are elements of the universe)

Data Model

For every vertex x:

� AX �child� x�

� AX �attribute� x�

� derived axes as views

Mapping: XML instance � XTreeGraph

XPathLog 8

XPathLog

SEMANTICS OF XPATHLOG

� Extends semantics for XPath by P.Wadler (1999)

XPathLog expressions: annotated result list

(i) a result list, and

(ii) with every element of the result list, a list of variablebindings is associated.

Example

expr =

//organization�O[member/@country[@car code�C and

name/text()�N]]

/abbrev/text()�A.

results in

list((“UN”, f(O/un, A/“UN”, C/“AL”, N/“Albania”),

(O/un, A/“UN”, C/“GR”, N/“Greece”),

: g),

(“EU”, f(O/eu, A/“EU”, C/“D”, N/“Germany”),

(O/eu, A/“EU”, C/“F”, N/“France”),

: g),

: )

XPathLog 9

XPathLog

XPATHLOG RULES

head(V�,. . . ,Vn) :- body(V�,. . . ,Vn)

� Evaluation of rule bodies = queries

� Constructive semantics for XPathLog atoms in rule heads

� Definite XPathLog atoms:

– uses only the child and sibling axes

– no negation, function applications, aggregation, andproximity position predicates

“/” and “[. . . ]” act as constructors:

� host�property�value� modifies host

� host�property remainder

inserts new element host�property which satisfiesremainder

� property of the form

– child��name

– child�i���name

– preceding�following�sibling��name

– preceding�following�sibling�i���name

– attribute��name

� unambiguous insertions

XPathLog: Rule Heads 10

XPathLog

RULE HEADS: UPDATES

Attributes

C[@datacode�“be”], C[@memberships�O] :-

//country�C[@car code=“B”],

//organization�O[abbrev/text()�“EFTA”].

�country datacode=“be” car code=“B”

memberships=“org-eu org-un org-efta . . . ”�...

�/country�

� C: host element

� O: target of reference

� extends AX �attribute�belgium�

XPathLog: Rule Heads 11

XPathLog

SEMANTICS OF RULE HEADS

Create “free” elements

/country[@car code�“BAV”].

�country car code=“BAV”� �/country�

XPathLog: Rule Heads 12

XPathLog

SEMANTICS OF RULE HEADS

Add subelement relationships

C[@capital�X and city�X and city�Y] :-

//country�C[@car code�“BAV”],

//city�X[name/text()=“Munich”],

//city�Y[name/text()=“Nurnberg”].

� city elements are linked as subelements.

� extends AX �child�bavaria� with �munich� city� and�nurnberg� city�.

� crucial for efficient in-place restructuring and integration

XPathLog: Rule Heads 13

XPathLog

Example: Linking

root

germany@car code=“D”@capital=� bavaria

@car code=“BAV”@capital=�

berlin nurnberg munich

b-name b-pop n-name n-pop m-name m-pop

“Berlin” 3472009 “Nurnberg” 495845 “Munich” 1244676

country country

citycity

city

city city

name population name populationname population

text() text() text() text() text() text()

� elements may have multiple parents

XPathLog: Rule Heads 14

XPathLog

SEMANTICS OF RULE HEADS

Generation of Elements by Path Expressions

C/name[text()�“Bavaria”] :-

//country�C[@car code=“BAV”].

�country car code=“BAV” capital=“city-munich”��city�. . . �/city��city�. . . �/city�

�name�Bavaria�/name��/country�

Atomized:C[name� N], N[text()�“Bavaria”] :-

root[descendant::country�C], C[@car code=“BAV”].

� variable binding C�bavaria.

� insert bavaria�child��name�x��:add an (empty) name subelement �x�� name� toAX �bavaria� child�.

� bind the local variable N to x�.

� generate the text contents to x�:add (“Bavaria”,text) to AX �x�� child�.

XPathLog: Rule Heads 15

XPathLog

DATA INTEGRATION

� objects of different sources represent the same real-worldobject

� Fusing objects, merging their properties

� synonyms

� not compatible with XML Data Model (DOM, XML QueryData Model)

� graph vs. tree

Integration 16

XPathLog

DATA INTEGRATION: “THREE-LEVEL” MODEL

access multiple sources

� “basic” layer: source(s) provide tree structures,

� optionally with namespaces

merge data from different sources

� “internal” layer: XTreeGraph

– overlapping trees

– multiple parents

– references

� fuse elements/merge subtrees

� add subelement links

� generate elements

� synonyms for properties

“export” layer: result trees views defined as projections

� root node

� signature

Integration 17

XPathLog

IMPLEMENTATION: LOPIX

Object

Manager

Sto

rage

OM AccessWebAccessDTD

ParserXML

Parser

Eva

luat

ion

Algebraic

Evaluation (S)

Algebraic

Insertion

Logic Evaluation

(Bottom-up) TXP

Exe

cutio

n

SystemCommands

XPathLog Parser(Programs and Queries)

User Interface Pretty PrinterBindings/XML

Inte

rnet

XMLurl�

DTDurl�

interactive

Output

XMLoutput

Internet in- and output

inserts to internal storage

internal information flow

querying internal storage

Implementation 18

XPathLog

RESULTS

� declarative semantics for generating XML with XPath

� graph data model suitable & necessary for integration

� firstly implemented XML updates

� DOM/XML Query Data Model not suitable:unique-parent, no fusion, synonyms, linking

� Updates in XML: requires to copy treesProblems with IDREFs

Conclusion 19

XPathLog

CONCLUSION

� practicable approach(implementation + case study)

– Faster than Kweelt, XML-QL, Excelon, and Tamino forqueries

� data model

� powerful, flexible language

– (metadata/schema querying)

– element creation

– element fusion

� extension concepts(classes, signatures)

� data-driven Web access,extensible for XLinks

Conclusion 20

XPathLog

Contents

1 Example: Mondial 1

2 Example: Mondial XML Tree 2

Design & Basics 3

3 Languages for XML 3

4 Design Decisions 4

5 XPathLog: Syntax 5

XPathLog 6

6 XPathLog by Examples 6

7 XPathLog by Examples 7

8 Data Model: XTreeGraph 8

9 Semantics of XPathLog 9

11 XPathLog Rules 11

12 Rule Heads: Updates 12

13 Semantics of Rule Heads 13

14 Semantics of Rule Heads 14

16 Semantics of Rule Heads 16

17 Data Integration 17

18 Data Integration: “Three-level” model 18

Implementation 19

19 Implementation: LoPiX 19

Results 20

20 Results 20

21 Conclusion 21

Conclusion 21