on the minimization of xpath queries

43
On the Minimization of XPath Queries Paper by S. Flesca, F. Furfaro, E. Masciari CmpE 521 Presentation Emre Yurtsever – 2002701372 Muammer Yüzügüldü – 2003700183

Upload: kisha

Post on 31-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

On the Minimization of XPath Queries. Paper by S. Flesca, F. Furfaro, E. Masciari CmpE 521 Presentation Emre Yurtsever – 2002701372 Muammer Yüzügüldü – 2003700183. Outline. Introduction Trees & Tree Patterns Problem Statement A Framework for minimizing X P ath queries - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: On the Minimization of XPath Queries

On the Minimization of XPath Queries

Paper byS. Flesca, F. Furfaro, E. Masciari

CmpE 521 Presentation

Emre Yurtsever – 2002701372Muammer Yüzügüldü – 2003700183

Page 2: On the Minimization of XPath Queries

2

Outline Introduction Trees & Tree Patterns Problem Statement A Framework for minimizing XPath

queries Complexity Results Tractability Results Conclusions & Future Works

Page 3: On the Minimization of XPath Queries

3

Introduction XML Queries are usually expressed

by means of XPath expressions.

XPath expressions.

A way of navigating an XML tree to return the set of nodes through the paths specified by the expression.

Page 4: On the Minimization of XPath Queries

4

Introduction – cont’d

An XPath expression can be represented graphically as a tree pattern.

<author>

<genre> <editor><title>

<authors>

<author>

<genre> <editor><title>

<bib>

<book><book>

An XML Tree...

Page 5: On the Minimization of XPath Queries

5

Introduction – cont’d

For example;“find the titles of all the books for which at least one author is known.”

XPath Expression:bib/book[//author]/title

A tree pattern

<bib>

<book>

<author> <title>

Output Node

Descendant Edge

Page 6: On the Minimization of XPath Queries

6

Introduction – cont’d

Efficiency of XPath expression depends on size.

Optimization ~ Minimization

We should minimize the expression.

Page 7: On the Minimization of XPath Queries

7

Introduction – cont’d

Example Query : “Retrieve the editors that published thrillers

and whose authors have written a thriller.”

“query containment”.

Reduced Query : “Retrieve the editors that published thriller.”

Page 8: On the Minimization of XPath Queries

8

Introduction – cont’d

Minimization problem for XPath fragments can be efficiently solved as:

1. It can be reduced to solve a number of instances of containment between pairs of tree patterns.

2. For these fragments, it can be reduced to find a homomorphism between them.

Page 9: On the Minimization of XPath Queries

9

Trees & Tree Patterns

A tree t is a tuple (rt, Nt, Et, t) where; Nt ℕ, set of nodes.

t : Nt is a node labelling function.

rt Nt is the distinguished root of t.

Et Nt x Nt, set of edges.

Page 10: On the Minimization of XPath Queries

10

Trees & Tree Patterns – cont’d

Given a tree t = (rt, Nt, Et, t)

Tree t’ = (rt’, Nt’, Et’, t’) is the subtree if : Nt’ Nt;

The edge (ni, nj) belongs to Et’ iff ni Nt’,

nj Nt’ and (ni, nj) Et.

Page 11: On the Minimization of XPath Queries

11

Trees & Tree Patterns – cont’d

Definition : A tree pattern p is a pair (tp, op), where:

• tp = (rp, Np, Ep, p) is a tree.

• Ep is partitioned into the two disjoint sets Cp and Dp denoting, respectively, the child and descendent branches;

• op Np is a distinguished output node.

Page 12: On the Minimization of XPath Queries

12

Trees & Tree Patterns – cont’d

Grammar for XPath expressions : exp exp | exp/exp | exp//exp | exp[exp] | | * | .

where is a symbol in , and the symbol ‘.’ stands for the current node.

Given XPath expression; a[b/*//c]//d

a

b d

c

*

Page 13: On the Minimization of XPath Queries

13

Trees & Tree Patterns – cont’d

Given a tree t and a tree pattern p, an embedding e of p into t is a total function e : Np Nt, such that:

1. e(rp) = rt,

2. (x; y) Cp, e(y) is a child of e(x) in t,

3. (x; y) Dp, e(y) is a descendant of e(x) in t, and

4. x Np, if p(x) = a (where a *) then t(e(x)) = a.

Page 14: On the Minimization of XPath Queries

14

Trees & Tree Patterns – cont’d

Models and Canonical Models of Tree Patterns The models of a tree pattern p defined over

the alphabet are the trees of T which can be embedded by p. The set of models of p is Mod(p) = {t T | p(t) }

Canonical models of a tree pattern p are models having the same shape as p. That is, a canonical

Page 15: On the Minimization of XPath Queries

15

Trees & Tree Patterns – cont’d

Model and Canonical Model of a tree pattern

Page 16: On the Minimization of XPath Queries

16

Trees & Tree Patterns – cont’d

Given two tree patterns p1, p2, we say that p1 is contained in p2 (p1 p2) iff t p1(t) p2(t).

We say that p1 and p2 are equivalent (p1 p2) iff p1 p2 and p2 p1 (i.e. t p1(t) = p2(t)).

The set of patterns which are equivalent to a given pattern p will be denoted as Eq(p).

Page 17: On the Minimization of XPath Queries

17

Trees & Tree Patterns – cont’d

Notations on tree patterns.

A pattern p and its subpatterns spb, spd, spa

Page 18: On the Minimization of XPath Queries

18

Trees & Tree Patterns – cont’d

Tree pattern p whose root has 2 children

Subpattern examples

Page 19: On the Minimization of XPath Queries

19

Problem Statement

“Given a tree pattern p, construct a tree pattern pmin which is equivalent to p and having minimum size (i.e. size(pmin) = minsize(p))”

Page 20: On the Minimization of XPath Queries

20

Problem Statement – cont’d

1. a minimum size tree pattern equivalent to p can be found among the subpatterns of p;

2. the containment between two tree patterns p, q (p q) is equivalent to the problem of finding a homomorphism from q to p. A homomorphism h from a pattern q to a pattern p is a total mapping from the nodes of q to the nodes of p such that:

h preserves node types (i.e. u Nq q(u) `*' ) q(u)

= p(h(u))); h preserves structural relationships (i.e.whenever v is

a child (resp. descendant) of u in q, h(v) is a child (resp. descendant) of h(u) in p).

Page 21: On the Minimization of XPath Queries

21

Problem Statement – cont’d

A homomorphism between two tree patterns

Page 22: On the Minimization of XPath Queries

22

Problem Statement – cont’d

Two tree patterns not related with homomorphism

Page 23: On the Minimization of XPath Queries

23

A framework for minimizing XPath Queries

Two fundamental contribution Proving that property 1 holds for

XP{/, //, [], *}

An algorithm for minimizing a tree pattern query

Page 24: On the Minimization of XPath Queries

24

Proving that Property holds for XP{/, //, [], *}

Various lemmas are introduced

Lemma 1 : Let p and q be two patterns with root r, such that p contained in q. Then, for each subpattern Qj element of P(q) there exists a subpattern Pi element of P(p) s.t Pi contained in Qi.

Page 25: On the Minimization of XPath Queries

25

Example for Lemma 1

Page 26: On the Minimization of XPath Queries

26

Proving that Property holds for XP{/, //, [], *}

Lemma 2 : Let p and be two patterns rooted in r s.t p=q and let m and n, with m>n, be the number of children of r in p and, respectively, q. Then, there exist a set S subset of SP(p) consisting of m-n subpatterns spi, such that p-S = p

Page 27: On the Minimization of XPath Queries

27

Example for Lemma 2

Page 28: On the Minimization of XPath Queries

28

Proving that Property holds for XP{/, //, [], *} Lemma 3 : Let p and q be two

equivalent patterns rooted in r having the same number of child and descendant nodes of r, and let q be of minimum size. Then, there not exists a subpattern spk element of SP(p) such that p – spk = p

Page 29: On the Minimization of XPath Queries

29

Proving that Property holds for XP{/, //, [], *} Lemma 4 : Let p and q be two

eqivalent patterns whose roots have the same number of child and descendant nodes, and let q be of minimum size. For each subpattern Pi element of P(p) there exists a unique subpattern Qj element of P(q) directly connected to rq s.t piqj

Page 30: On the Minimization of XPath Queries

30

Proving that Property holds for XP{/, //, [], *} Lemma 5 : A pattern p in XP {[], /,

//, *} is not of minimum size iff at least one of the following conditions hold:

1. there exists a pair of subpatterns pi, pj s.t pi contained in pj;

2. there exists a subpattern pi of p which is not of minimum size.

Page 31: On the Minimization of XPath Queries

31

Proving that Property holds for XP{/, //, [], *} Theorem 1 : Given a pattern p in

XP{/, //, [], *} if minsize(p) = k then there exists a subpattern pmin of p such that p= pmin and size(pmin)=k

Page 32: On the Minimization of XPath Queries

32

An Algorithm for tree pattern minimization Function Minimize(p:a tree pattern):pmin a minimum tree pattern

equivalent to pBegin

pmin = p;For each pi element of P(pmin) do

if (pmin -spi contained in pmin) pmin = pmin – spi;

SPnew = 0;For each spi element of SP(pmin) do

SPnew = SPnew + Minimize(spi);pmin = assemble (pmin, SPnew);return pmin;

End

Page 33: On the Minimization of XPath Queries

33

Upper Bound Algorithm 1 works in O(b*r*(|p|

^2)*((w+1)^(d+1))) |p| is the size of p d is the number descendant edges in p w is one the longest chain of ‘*’ in p b is the number branches of p as b r is the maximum degree of any node of p

Page 34: On the Minimization of XPath Queries

34

Complexity Results

In XPath{/, //, [], *} it is not possible to define an algorithm performing much better than Algorithm 1

Lemmma 6: Let p be a pattern in XP{/, //, [], *} and k is possitive integer. The problem of testing if minsize(p)>k is NP-complete problem

Page 35: On the Minimization of XPath Queries

35

Complexity Results

Theorem 2 : Let p be a pattern in XP{/, //, [], *} and k a positive integer. The problem of testing if there exists a pattern p’ equivalent to p such that size(p’)<= k is coN-complete

Page 36: On the Minimization of XPath Queries

36

Tractability Results Definition : A limited branched tree

pattern p is a tree pattern in XP{/, //, [], *} such that:

1. Every non leaf node of p may have any number of children;

2. If a node n has k children n1...nk, then at least k-1 of the patterns spn, (where i element [1...k]) are linear.

Page 37: On the Minimization of XPath Queries

37

Example

b

a

b d

b

a

c

*

*

r

b

Page 38: On the Minimization of XPath Queries

38

Tractability Results Theorem : Let p a limited branched tree

pattern. A minimum pattern pmin equivalent to p can be found in polynomial time. (w.r.t. The size of p)

Linear patterns have minimum size The containment between pairs of linear

patterns can be decided in polynomial time.

Page 39: On the Minimization of XPath Queries

39

Tractability Results Algorithm 2Function Minimize(p:a boundend branched tree pattern):pmin a minimum tree pattern equivalent to pBegin

pmin = p;B = {b1, ...., bm};while(B != 0)

b = deepest(B);q = spb;Redq = 0;For each pi element of P(q) do

For each qj of P(q) doif ((i!=j) ^ (qi is linear) ^ (qj not element of Redq) ^(qj

contained in qi))Redq = Redq + qi;

q = q – Redq;pmin = replace(pmin, sqb, q);

end whilereturn pmin;

End

Page 40: On the Minimization of XPath Queries

40

Conclusion It has been proved the global minimality

property, a minimum tree pattern equivalent to a given tree pattern p can be found amoung the subpatterns of p, and thus obtained by prunning “redundant” branches from p.

It has been characerized the complexity of the minimization problem, showing that the corresponding decisional problem is coNP-complete.

Page 41: On the Minimization of XPath Queries

41

Conclusion It has been studied a “tractable”

form of tree pattern which can be minimized in polynomial time.

It has been provided by some algorithms proposed in the paper.

Page 42: On the Minimization of XPath Queries

42

Future Works Extending minimization framework

to deal with XPath queries that must satisfy some constraints such as join conditions on tree pattern nodes.

The introduction of these constraints makes the minimization problem harder, and global minimality property does not hold.

Page 43: On the Minimization of XPath Queries

43

Questions?

Thank you...