1 rainbow xml-query processing revisited: the incomplete story (part ii) xin zhang

35
1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

Post on 22-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

1

Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II)

Xin Zhang

Page 2: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

2

Outline XAT Decorrelation. Optimization

XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Conclusion & Future Works.

Page 3: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

3

XAT Decorrelation XQuery is Correlated Query Decorrelation is required for

Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Page 4: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

4

Three kinds of Decorrelation Simple Decorrelation

No Additional sources No Aggregate Functions

Complex Decorrelation with Additional Sources

Complex Decorrelation with Aggregate Functions

Page 5: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

5

<!ELEMENT prices (book*)> <!ELEMENT book (title, price)> <!ELEMENT title (#PCDATA)> <!ELEMENT source (#PCDATA)> <!ELEMENT price (#PCDATA)>

<prices> <book>

<title> TCP/IP Illustrated </title> <price>65.95</price>

</book> <book>

<title> TCP/IP Illustrated </title> <price>65.95</price>

</book> <book>

<title>Data on the Web</title> <price>34.95</price>

</book> <book>

<title>Data on the Web</title> <price>39.95</price>

</book> </prices>

Example* of XML Use Cases.

Page 6: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

6

Simple Query Example

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice>[$t]</minprice>):col1

<results> {

for $t in distinct (document("prices.xml") /book/title) return

<minprice> $t

</minprice> }

</results>

In the document "prices.xml", find the book title.

Page 7: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

7

Simple DecorrelationLinear the Tree: T[FOR(CB, T2[])[T1[S1]]]

T[T2[T1[S1]]]

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice>[$t]</minprice>):col1

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

T (<minprice>[$t]</minprice>):col1

Page 8: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

8

Is Simple Decorrelation Right? Every operator, except Groupby,

has the semantic of “for each” tuple in the input table.

Hence, the FOR operator can be omitted in the simple decorrelation scenario.

Page 9: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

9

Two types of Navigates Navigate Unnesting: U

Unnesting the parent-children relationship, and duplicates the parent values for each child.

Navigate Collection: C

Nesting the parent-children relationship, create a collection of children, but keep the single parent.

Page 10: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

10

Where to use two types Navigate Unnesting: U

FOR binding. Navigate Collection: C

LET binding.

Page 11: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

11

Complex Query Example

c($b, price):col4

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice> [$t], [col4]</minprice>):col1

<results> {

for $t in distinct (document("prices.xml") /book/title),let $b := document(“prices.xml") /book [title = $t]return

<minprice> $t, $b/price

</minprice> }

</results>

In the document "prices.xml", find the book title and its prices.

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

c($b, title):col3

Page 12: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

12

Complex Decorrelation with Additional Source

: T[FOR(CB, T2[S2])[T1[S1]]] T[T2[[T1[S1],S2]]]

c($b, price):col4

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

c($b, title):col3

C($b, price):col4

T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

C($b, title):col3

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

Page 13: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

13

Full Query Example

c($b, price/text()):col4

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice> [$t], <price>[col5]</price></minprice>):col1

<results> {

for $t in distinct (document("prices.xml") /book/title),let $b := document(“prices.xml") /book [title = $t]return

<minprice> $t, <price>min($b/price/text())</price>

</minprice> }

</results>

In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element.

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

c($b, title):col3

min(col4):col5

Page 14: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

14

Complex Query Decorrelation with one Aggregation Function

T[FOR(CB, T2[Agg(T3[])])[T1[S1]]] T[(DM(T1))[T1,T2[(DM(T1),Agg(T3[[Distinct(T1[S1]),

S2]))]]]

DM(T1) is data model computed from T1.

S2

Agg()

T1

S1

T3

FOR($rate)

T2

T

S1

Groupby(DM(T1), Agg())

S2

T3

TT2

T1

Distinct

Page 15: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

15

The Query after Decorrelation

c($b, price/text()):col4

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice> [$t], <price>[col5]</price></minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

c($b, title):col3

min(col4):col5

C($b, price/text()):col4

T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

C($b, title):col3

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

GB(DM, min(col4):col5)

Page 16: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

16

Where are we? XAT Decorrelation. Optimization

XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Conclusion & Future Works.

Page 17: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

17

XAT Computation Pushdown To push the execution into

relational database Steps:

Push Navigation down. Cancel out Navigation and Tagger. Generating SQL stmt.

Page 18: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

18

Navigation Pushdown Basically Navigation can push through

all the operators until: Has dependency on its child operator.

Example Rewriting rules: (x1, path):x2[(y1, path):y2[T]] (y1,

path):y2[(x1, path):x2[T]] (x1 != y2) (x1, path):x2[(c) [T]] (c) [(x1, path):x2[T]] (x1, path):x2[[T1, T2]] [T1, (x1, path):x2[T2]]

(if x1 in DM(T2)) (x1, path):x2[[T1, T2]] [(x1, path):x2[T1], T2]

(if x1 in DM(T1))

Page 19: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

19

Navigation Pushdown Example

C($b, price/text()):col4

T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

C($b, title):col3

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

GB(DM, min(col4):col5)

C($b, price/text()):col4

T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

C($b, title):col3

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

GB(DM, min(col4):col5)

Page 20: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

20

Navigation/Tagger Cancel Out Used to simplify a composite XAT

tree. Transformation Rules:

(x, /):y[T(<tag>[z]</tag>):x[s]] s Note: Also use type analysis for the

cancel out.

Page 21: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

21

View Query Example<DB>

<book> <row>

<title> TCP/IP Illustrated </title> <price>65.95</price>

</row> <row>

<title> TCP/IP Illustrated </title> <price>65.95</price>

</row> <row>

<title>Data on the Web</title> <price>34.95</price>

</row> <row>

<title>Data on the Web</title> <price>39.95</price>

</row> </book>

</prices>

<prices> {

for $row in distinct (DXV /book/row),return

<book> $row/title, $row/price

</book> }

</prices>

T(<prices>[col6]</prices>):col5

T(<book>[col7],[col8]</book>):col6

S(DXV):R3

(R3, /book/row):$row

Agg()

($row, title):col7

($row, price):col8

Page 22: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

22

Cancel Out Example (1)

C($b, price/text()):col4

S(“prices.xml”):R2

C(R2, /book):$b

C($b, title):col3

...

T(<prices>[col6]</prices>):col5

T(<book>[col7],[col8]</book>):col6

S(DXV):R3

(R3, /book/row):$row

Agg()

($row, title):col7

($row, price):col8

C($b, price/text()):col4

C(R2, /book):$b

C($b, title):col3

...

T(<prices>[col6]</prices>):R2

T(<book>[col7],[col8]</book>):col6

S(DXV):R3

(R3, /book/row):$row

Agg()

($row, title):col7

($row, price):col8

(x, y)[op():x[s]] op():y[s]

Page 23: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

23

Cancel Out Example (2)

C($b, price/text()):col4

C(R2, /book):$b

C($b, title):col3

...

T(<prices>[col6]</prices>):R2

T(<book>[col7],[col8]</book>):col6

S(DXV):R3

(R3, /book/row):$row

Agg()

($row, title):col7

($row, price):col8

C($b, price/text()):col4

C($b, title):col3

...

T(<book>[col7],[col8]</book>):$b

S(DXV):R3

(R3, /book/row):$row

($row, title):col7

($row, price):col8

Page 24: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

24

Cancel Out Example (3)

C($b, price/text()):col4

C($b, title):col3

...

T(<book>[col7],[col8]</book>):$b

S(DXV):R3

(R3, /book/row):$row

($row, title):col7

($row, price):col8

C($b, price/text()):col4

...

T(<book>[col7],[col8]</book>):$b

S(DXV):R3

(R3, /book/row):$row

($row, title):col3

($row, price):col8

Page 25: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

25

Cancel Out Example (4)

C($b, price):temp1

...

T(<book>[col7],[col8]</book>):$b

S(DXV):R3

(R3, /book/row):$row

($row, title):col3

($row, price):col8

C(temp1, text()):col4

...

S(DXV):R3

(R3, /book/row):$row

($row, title):col3

($row, price):temp1

C(temp1, text()):col4

Page 26: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

26

SQL Generation Find a pattern in the XAT Translate that pattern into a SQL

operator that will access the relational database.

Page 27: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

27

SQL Generation Example...

S(DXV):R3

(R3, /book/row):$row

($row, title):col3

($row, price):temp1

C(temp1, text()):col4

...

SQL(select title as col3,

price as temp1 from book):{col3,temp}

C(temp1, text()):col4

Page 28: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

28

Where are we? XAT Decorrelation. Optimization

XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Conclusion & Future Works.

Page 29: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

29

XAT Data Model Cleanup By Default Each operator will append one

additional columns to the data model. Used to Help:

Execute: used to optimize the data storage during the execution

Cutting: get rid of the un-used operators in the XQuery

Equations for Data Model Cleanup Only keep the columns required by ancestors. DM := (DMp – Pp) Cp (P – C)

Page 30: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

30

Data Model Examplefor $b in document("prices.xml") /booklet $prices := $b/pricereturn

$b

S(“prices.xml”):R1

(R1, /book):$b

Agg()

($b,):col1

C($b, price):$prices

1

2

3

4

5

Node

Produce Consume

DM before DM after

1 {} {} {$prices, R1, $b, col1}

{}

2 {col1} {$b} {$prices, R1, $b, col1}

{col1}

3 {$prices}

{$b} {$prices, R1, $b}

{$b, $prices}

4 {$b} {R1} {R1, $b} {$b}

5 {R1} {} {R1} {R1}

DM := (DMp – Pp) Cp (P – C)

Page 31: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

31

Where are we? XAT Decorrelation. Optimization

XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Conclusion & Future Works.

Page 32: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

32

XAT Cutting General Idea:

Get rid of the operators that’s produce useless data.

Equations: R := (Rp – P) C (P M) (Rp Mp) = NULL

Page 33: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

33

XAT Cutting Example

R := (Rp – P) C

(P M) (Rp Mp)= NULL

for $b in document("prices.xml") /booklet $prices := $b/pricereturn

$b

S(“prices.xml”):R1

(R1, /book):$b

Agg()

($b,):col1

C($b, price):$prices

1

2

3

4

5

Node

Produce Consume

Modified

Required

Cut?

1 {} {} {*} {} N/A

2 {col1} {$b} {} {$b} {col1}

3 {$prices}

{$b} {} {$b} {}

4 {$b} {R1} {} {R1} {$b}

5 {R1} {} {} {} {R1}

Page 34: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

34

Conclusions XQuery are heavily correlated,

hence need to be decorrelated for better optimization.

After Decorrelation, more optimization techniques can be applied: Computation Pushdown. Data Model Cleanup. Cutting.

Page 35: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang

35

Future Works Write TR to formalize the XAT.

Compare with ORDB, ODB, also XQA operators. Wrap Up:

Finalize uncertain operators deal with collections Union, Navigate

Formalize the Pushdown Rewriting Rules by Type (Reg. Exp. Type) Analysis

Finalize the XAT Rewriting Rules for: Order Handling Update propagation.

Translation from XAT back to Query Next Step:

Generate Search Space and Optimization Algorithm for XAT, ready for Schema Generation.