haris georgiadis minas charalambides vasilis vassalos

24
Haris Georgiadis Minas Charalambides Vasilis Vassalos Athens University of Economics and Busines 1 Efficient Physical Operators for a cost- based XPath Execution Engine

Upload: xia

Post on 23-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Efficient Physical Operators for a cost-based XPath Execution Engine. Haris Georgiadis Minas Charalambides Vasilis Vassalos. Athens University of Economics and Business. Motivation (1). XPath query: /s/r/*/it[ mb /m/to=‘x’]//k Three navigation alternatives (among others):. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Haris Georgiadis Minas Charalambides

Vasilis Vassalos

Athens University of Economics and Business1

Efficient Physical Operators for a cost-based XPath Execution Engine

Page 2: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Motivation (1)XPath query: /s/r/*/it[mb/m/to=‘x’]//k

Three navigation alternatives (among others):

Straightforward navigationretrieve all it elements under /s/r/*/it; keep those having at least one to descendant under /mb/m/to with text value ‘x’. For the it elements left, return their k descendants.

Starting from kreturn all k elements with at least one it ancestor, which in turn: • has a to descendant under /mb/m/to with text value ‘x’ and • has a s document element ancestor via relative path parent::*/parent::r/parent::s.

Starting from toreturn all to elements under /s/r/*/it/mb/m/to, keep only those with text value ‘x’, then go backward via parent::m/parent::mb/parent::it and, for the it elements left, return their k descendants

2Athens University of Economics and Business

Page 3: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Motivation (2)Many XPath processing algorithms

PPFS+ , Staircase Join, Sort Merge-based structural joins, PathStack, Twig2Stack etc

Many physical data models and storage techniques : Shredding on relations:

Schema-based mapping vs. edge-based mappingStorage into disk pages preserving XML

hierarchy Structural encodings:

Region Encoding vs. Prefix based encodingData structures: XB-trees, F&B Index, Path

indexes 3Athens University of Economics and Business

Page 4: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Contribution IGeCOEX: the first generic Xpath cost-

based execution and optimization frameworkAgnostic to the underlying XML storage

system and the access methods it supports

Independent of the techniques and algorithms available for XPath processing. Encapsulated in operator implementations, and

rewriting rulesCost based optimization

5Athens University of Economics and Business

Page 5: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Contribution IIXPalgebra: A novel XPath logical algebra

Good fit with many XPath processing techniques

Lookup and SM: two novel and efficient families of physical operators for Xpath

Multiple storage engines Experimental evaluation: Direct

comparison of operator implementations

Athens University of Economics and Business 6

Page 6: Haris Georgiadis Minas Charalambides Vasilis Vassalos

GeCOEX System ArchitectureParser

Physical Plan Executor

XPath query

result XPA API

Primitive Access

Method Cost Models

Database Statistics

Physical Plan

Selector

Que

ry O

ptim

izatio

nQ

uery

Exe

cutio

n

XPA

Driver

Rewriting Rules

Descriptors

Physical Operators

Descriptors

Physical Operator Descriptors Cost

Models

Descriptors

Physical Operator Descriptors Cost

Models

Primitive Access

Method Cost Models

Descriptors

Physical OperatorsPrimitive

Access Methods

Primitive Access

Methods

Data Model

Database Statistics

7Athens University of Economics and Business

Page 7: Haris Georgiadis Minas Charalambides Vasilis Vassalos

XPalgebraGeneric sequence-based logical algebra for a subset of XPath

Forward and backward axes Non-positional predicates involving conjunctive boolean

expressionsMaintains the navigation nature of XpathData Model

ElementSequence

Duplicate-free list of elements in document orderSequence Operators: (mainly) navigation

Input and Output: SequenceBoolean Operators: used for filtering

Input: ElementOutput: True or False

8Athens University of Economics and Business

Page 8: Haris Georgiadis Minas Charalambides Vasilis Vassalos

XPalgebra – Sequence OperatorsBoth the input and the output of a Sequence operator are

sequences of nodesThe input sequence is called context sequence

BoolExpr: const | Ъ1^Ъ2^ … ^Ъn , where Ъi : Boolean Operator10Athens University of Economics and Business

Page 9: Haris Georgiadis Minas Charalambides Vasilis Vassalos

XPalgebra – Boolean Operatorsapplied on single nodes only the input element is called context elementreturn boolean values

BoolExpr: const | Ъ1^Ъ2^ … ^Ъn , where Ъi : Boolean Operator

f(S, Ъfp/d//c)

…[d//c]

12Athens University of Economics and Business

Page 10: Haris Georgiadis Minas Charalambides Vasilis Vassalos

XPalgebra - examples/s/r/*/it[mb/m/to=‘x’]//k

dk(f(fp/s/r/*/it(root), Ъfp/mb/m/to(Ъvftext()=x)))13Athens University of Economics and Business

Page 11: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Physical Operators

Athens University of Economics and Business 14

Implements the Sequence interface of XPA APIAccess the XML data using the AccessMethods interface of

the XPA APIExample: a physical operator implementation

That’s how physical operators are agnostic to the physical data model

Page 12: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Physical OperatorsLarge number of physical operators, divided

roughly into four ‘families’:Lookup operators (LU)

Inspired by indexed nested loops joindLU

a: for each element n from input sequence S make a lookup using XPAAPI.Descs(n, a)

SortMerge-based operators(SM) Inspired by Sort Merge joindSM

a: scan all elements from input sequence S and all a elements (using XPAAPI.Descs(root, a)) and find ‘ancestor-descendant’ matches

Staircase Join operators[Grust 2003]PathStack operators [Bruno 2002]

Athens University of Economics and Business 15

Page 13: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Physical Operators

Athens University of Economics and Business 16

s LU* SM* Staircase[Grust 2003]

PathStack[Bruno 2002]

c (child) **

d (descendant) fp (forward path) **

p (parent) X **

a (ancestor) **

bp (backward path)

** X

cs (cousin) X X

**: inspired by original

Page 14: Haris Georgiadis Minas Charalambides Vasilis Vassalos

5 XML Storage Systems and their XPA drivers

22Athens University of Economics and Business

Parser

Physical Plan Executor

XPath query

result XPA API

Primitive Access

Method Cost Models

Database Statistics

Physical Plan

Selector

Que

ry O

ptim

izatio

nQ

uery

Exe

cutio

n

XPA

Driver

Rewriting Rules

Descriptors

Physical Operators

Descriptors

Physical Operator Descriptors Cost

Models

Descriptors

Physical Operator Descriptors Cost

Models

Descriptors

Physical OperatorsPrimitive

Access Methods

Data Model

XMLStorageSystem

The PE-basic Native XML storage system Dewey encoding, 1 B-Tree per tag name

The RE-basic Native XML storage system Pre/Post/Level encoding, 1 B-Tree per tag

nameThe PE-Path Native XML storage system

Dewey encoding, 1 B-Tree per tag name, Paths B-Tree

The RE-Path Native XML storage system Pre/Post/Level encoding, 1 B-Tree per tag

name, Paths B-Tree

The Edge-RE Native XML storage system Pre/Post/Level encoding, 1 B-Tree for all

elements

Page 15: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Lookup OperatorsNovel efficient algorithms for holistically evaluating

forward and backward multi-step pathsBased on root-to-node filtering.

buffered-leaping: a new technique for pipelined duplicate elimination and document order preservation

Search a minimum window of elements for each element in the context sequencewindow: the result of calling the method from the

AccessMethods interface of the XPA API (e.g. Descs(), Ancs()) corresponding to the XPath axis (e.g. descendant, ancestor) for a given context element

Page 16: Haris Georgiadis Minas Charalambides Vasilis Vassalos

The size of chain at any time is very small and upper bounded by the depth of the

XML document

Example: fpLU/c/f

r

b1 b2 b3 b8

c f4c b4 b6 b7

b9

c

f1

f2

f3 f5b5 c c c f11 c

d c

f6 f7

f8 f9 d

f10f12 f13

c c

f14 f15

f16 d

c

f17

e

rootAnc contextEl chainnext()

b1 b1

b2

b2 not a descendant of b1

window =XPAPI.Descs(b1,‘f’);

regExprFilter(f1.getRTNPath(), /c//f, 1) = true

f1

next()

regExprFilter(f2.getRTNPath(), /c//f, 1) = falseregExprFilter(f3.getRTNPath(), /c//f, 1) = true

f3

b2

b3

b3 not a descendant of b2

window =XPAPI.Descs(b2,‘f’);

regExprFilter(f4.getRTNPath(), /c//f, 1) = false

next()

regExprFilter(f5.getRTNPath(), /c//f, 1) = true

f5

next()b3

b5

b5 is a descendant of b3

window =XPAPI.Descs(b3,‘f’);

b5 b7

b7 is a descendant of b3

b7

b9

b9 is not a descendant of b3

f6 descendant of b3 and regExprFilter(f6.getRTNPath(), /c//f, 1) = falsef6 descendant of b5 and regExprFilter(f6.getRTNPath(), /c//f, 3) = falsef6 not descendant of b7f7 descendant of b3 and regExprFilter(f7.getRTNPath(), /c//f, 1) = falsef7 descendant of b5 and regExprFilter(f7.getRTNPath(), /c//f, 3) = true

f7

f8 descendant of b3 and regExprFilter(f8.getRTNPath(), /c//f, 1) = falsef8 not descendant of b5f8 not descendant of b7f9 again not reachable from any of b3, b5, b7 via /c//ff10 again not reachable from any of b3, b5, b7 via /c//ff11 again not reachable from any of b3, b5, b7 via /c//ff12 is reachable from b7 via /c//f

f12next()

next()

f13 is reachable from b7 via /c//f

f13

next()

b9

nullcontext sequence is exhausted

window =XPAPI.Descs(b9,‘f’);

f16 is not reachable from b9 via /c//ff17 is reachable from b9 via /c//f

f17

Page 17: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Example: bpLUparent::c/ancestor::b

r

b1 b2 b3 b8

c f4c b4 b6 b7

b9

c

f1

f2

f3 f5b5 c c c f11 c

d c

f6 f7

f8 f9 d

f10f12 f13

c c

f14 f15

f16 d

c

f17

e

contextEl sortedElementswindow =XPAPI.Ancs(f2,‘b’); window =XPAPI.Ancs(f3,‘b’);

window =XPAPI.Ancs(f5,‘b’);

window =XPAPI.Ancs(f6,‘b’);

window =XPAPI.Ancs(f8,‘b’);

window =XPAPI.Ancs(f11,‘b’);

Cheap implementation of Ancs() in the PE-Path driverDewey(f2)=1.1.2.1.1RTN(f2)= /r/b/c/f => there is a ‘b’ ancestor b’ at level 2ÞDewey(b’)= substr(dewey(f2), …) = 1.1 RTN(b’)=substr(RTN(f2), …) = /r/bAncs() outputs n without actually retrieving b1 from the database. n is the virtual representation of b1, denoted as #b1

b1#f2

f3

f5

f3 is a descendant of b1

V

next() b1

b2# V

f5 not a descendant of b1f6 not a descendant of b2

f6

next() b2

next()

b3# b4# b5#

f8

V

f8 is a descendant of b3

f11

f11 is a descendant of b3

b7#

null

b4

reverseOf(parent::c/ancestor::b)=/c//fV: regExprFilter(f3.getRTNPath(), /c//f, 1)=true

Page 18: Haris Georgiadis Minas Charalambides Vasilis Vassalos

SM OperatorsInspired by sort-merge join algorithmsTraverse two sequences of elements, left and right

left: the context sequence (the input sequence)right: always consists of all the elements of the requested

tag nameKeeping track of the current elements on left and right,

try to find matching pairs according to the appropriate navigation axis and condition

Novel techniques for holistic SM-based forward path and backward path operators with guaranteed low memory requirements

Page 19: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Performance Comparison

Page 20: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Performance Comparison

Page 21: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Sensitivity to context selectivitydescendant ancestor

forward path

Page 22: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Conclusions I Novel techniques for evaluating forward and

backward multi-step paths pipelined duplicate elimination and document

order preservationLookup fp, Lookup bp, Lookup cs, SM fp, SM

bp, SM csFast backwards navigation that fully exploits

the capabilities of the underlying storage system

Algorithms perform well across a variety of different physical storage models

First steps towards building cost models for XPath Athens University of Economics and Business 33

Page 23: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Conclusions II Operator-based XPath processing provides

significant optimization opportunitiesDifferent implementations of logical

operators can provide benefits in different circumstancesE.g. context selectivity

Query plans can be much more efficient than (existing) monolithic (twig) techniques in most circumstances

34Athens University of Economics and Business

Page 24: Haris Georgiadis Minas Charalambides Vasilis Vassalos

Thank you!

36Athens University of Economics and Business