1 rainbow xml-query processing revisited: the incomplete story (part ii) xin zhang
Post on 22-Dec-2015
220 views
TRANSCRIPT
1
Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II)
Xin Zhang
2
Outline XAT Decorrelation. Optimization
XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.
Conclusion & Future Works.
3
XAT Decorrelation XQuery is Correlated Query Decorrelation is required for
Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.
4
Three kinds of Decorrelation Simple Decorrelation
No Additional sources No Aggregate Functions
Complex Decorrelation with Additional Sources
Complex Decorrelation with Aggregate Functions
5
<!ELEMENT prices (book*)> <!ELEMENT book (title, price)> <!ELEMENT title (#PCDATA)> <!ELEMENT source (#PCDATA)> <!ELEMENT price (#PCDATA)>
<prices> <book>
<title> TCP/IP Illustrated </title> <price>65.95</price>
</book> <book>
<title> TCP/IP Illustrated </title> <price>65.95</price>
</book> <book>
<title>Data on the Web</title> <price>34.95</price>
</book> <book>
<title>Data on the Web</title> <price>39.95</price>
</book> </prices>
Example* of XML Use Cases.
6
Simple Query Example
T(<results>[col1]</results>):col0
distinct(col2):$t
S(“prices.xml”):R1
(R1, /book/title):col2
FOR($t)
Agg()
T (<minprice>[$t]</minprice>):col1
<results> {
for $t in distinct (document("prices.xml") /book/title) return
<minprice> $t
</minprice> }
</results>
In the document "prices.xml", find the book title.
7
Simple DecorrelationLinear the Tree: T[FOR(CB, T2[])[T1[S1]]]
T[T2[T1[S1]]]
T(<results>[col1]</results>):col0
distinct(col2):$t
S(“prices.xml”):R1
(R1, /book/title):col2
FOR($t)
Agg()
T (<minprice>[$t]</minprice>):col1
T(<results>[col1]</results>):col0
distinct(col2):$t
S(“prices.xml”):R1
(R1, /book/title):col2
Agg()
T (<minprice>[$t]</minprice>):col1
8
Is Simple Decorrelation Right? Every operator, except Groupby,
has the semantic of “for each” tuple in the input table.
Hence, the FOR operator can be omitted in the simple decorrelation scenario.
9
Two types of Navigates Navigate Unnesting: U
Unnesting the parent-children relationship, and duplicates the parent values for each child.
Navigate Collection: C
Nesting the parent-children relationship, create a collection of children, but keep the single parent.
10
Where to use two types Navigate Unnesting: U
FOR binding. Navigate Collection: C
LET binding.
11
Complex Query Example
c($b, price):col4
T(<results>[col1]</results>):col0
distinct(col2):$t
S(“prices.xml”):R1
(R1, /book/title):col2
FOR($t)
Agg()
T (<minprice> [$t], [col4]</minprice>):col1
<results> {
for $t in distinct (document("prices.xml") /book/title),let $b := document(“prices.xml") /book [title = $t]return
<minprice> $t, $b/price
</minprice> }
</results>
In the document "prices.xml", find the book title and its prices.
S(“prices.xml”):R2
C(R2, /book):$b
(col3=$t)
c($b, title):col3
12
Complex Decorrelation with Additional Source
: T[FOR(CB, T2[S2])[T1[S1]]] T[T2[[T1[S1],S2]]]
c($b, price):col4
T(<results>[col1]</results>):col0
distinct(col2):$t
S(“prices.xml”):R1
(R1, /book/title):col2
FOR($t)
Agg()T (<minprice> [$t], [col4]</minprice>):col1
S(“prices.xml”):R2
C(R2, /book):$b
(col3=$t)
c($b, title):col3
C($b, price):col4
T (<minprice> [$t], [col4]</minprice>):col1
S(“prices.xml”):R2
C(R2, /book):$b
(col3=$t)
C($b, title):col3
T(<results>[col1]</results>):col0
distinct(col2):$t
S(“prices.xml”):R1
(R1, /book/title):col2
Agg()
13
Full Query Example
c($b, price/text()):col4
T(<results>[col1]</results>):col0
distinct(col2):$t
S(“prices.xml”):R1
(R1, /book/title):col2
FOR($t)
Agg()
T (<minprice> [$t], <price>[col5]</price></minprice>):col1
<results> {
for $t in distinct (document("prices.xml") /book/title),let $b := document(“prices.xml") /book [title = $t]return
<minprice> $t, <price>min($b/price/text())</price>
</minprice> }
</results>
In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element.
S(“prices.xml”):R2
C(R2, /book):$b
(col3=$t)
c($b, title):col3
min(col4):col5
14
Complex Query Decorrelation with one Aggregation Function
T[FOR(CB, T2[Agg(T3[])])[T1[S1]]] T[(DM(T1))[T1,T2[(DM(T1),Agg(T3[[Distinct(T1[S1]),
S2]))]]]
DM(T1) is data model computed from T1.
S2
Agg()
T1
S1
T3
FOR($rate)
T2
T
S1
Groupby(DM(T1), Agg())
S2
T3
TT2
T1
Distinct
15
The Query after Decorrelation
c($b, price/text()):col4
T(<results>[col1]</results>):col0
distinct(col2):$t
S(“prices.xml”):R1
(R1, /book/title):col2
FOR($t)
Agg()
T (<minprice> [$t], <price>[col5]</price></minprice>):col1
S(“prices.xml”):R2
C(R2, /book):$b
(col3=$t)
c($b, title):col3
min(col4):col5
C($b, price/text()):col4
T (<minprice> [$t], [col4]</minprice>):col1
S(“prices.xml”):R2
C(R2, /book):$b
(col3=$t)
C($b, title):col3
T(<results>[col1]</results>):col0
distinct(col2):$t
S(“prices.xml”):R1
(R1, /book/title):col2
Agg()
GB(DM, min(col4):col5)
16
Where are we? XAT Decorrelation. Optimization
XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.
Conclusion & Future Works.
17
XAT Computation Pushdown To push the execution into
relational database Steps:
Push Navigation down. Cancel out Navigation and Tagger. Generating SQL stmt.
18
Navigation Pushdown Basically Navigation can push through
all the operators until: Has dependency on its child operator.
Example Rewriting rules: (x1, path):x2[(y1, path):y2[T]] (y1,
path):y2[(x1, path):x2[T]] (x1 != y2) (x1, path):x2[(c) [T]] (c) [(x1, path):x2[T]] (x1, path):x2[[T1, T2]] [T1, (x1, path):x2[T2]]
(if x1 in DM(T2)) (x1, path):x2[[T1, T2]] [(x1, path):x2[T1], T2]
(if x1 in DM(T1))
19
Navigation Pushdown Example
C($b, price/text()):col4
T (<minprice> [$t], [col4]</minprice>):col1
S(“prices.xml”):R2
C(R2, /book):$b
(col3=$t)
C($b, title):col3
T(<results>[col1]</results>):col0
distinct(col2):$t
S(“prices.xml”):R1
(R1, /book/title):col2
Agg()
GB(DM, min(col4):col5)
C($b, price/text()):col4
T (<minprice> [$t], [col4]</minprice>):col1
S(“prices.xml”):R2
C(R2, /book):$b
(col3=$t)
C($b, title):col3
T(<results>[col1]</results>):col0
distinct(col2):$t
S(“prices.xml”):R1
(R1, /book/title):col2
Agg()
GB(DM, min(col4):col5)
20
Navigation/Tagger Cancel Out Used to simplify a composite XAT
tree. Transformation Rules:
(x, /):y[T(<tag>[z]</tag>):x[s]] s Note: Also use type analysis for the
cancel out.
21
View Query Example<DB>
<book> <row>
<title> TCP/IP Illustrated </title> <price>65.95</price>
</row> <row>
<title> TCP/IP Illustrated </title> <price>65.95</price>
</row> <row>
<title>Data on the Web</title> <price>34.95</price>
</row> <row>
<title>Data on the Web</title> <price>39.95</price>
</row> </book>
</prices>
<prices> {
for $row in distinct (DXV /book/row),return
<book> $row/title, $row/price
</book> }
</prices>
T(<prices>[col6]</prices>):col5
T(<book>[col7],[col8]</book>):col6
S(DXV):R3
(R3, /book/row):$row
Agg()
($row, title):col7
($row, price):col8
22
Cancel Out Example (1)
C($b, price/text()):col4
S(“prices.xml”):R2
C(R2, /book):$b
C($b, title):col3
...
T(<prices>[col6]</prices>):col5
T(<book>[col7],[col8]</book>):col6
S(DXV):R3
(R3, /book/row):$row
Agg()
($row, title):col7
($row, price):col8
C($b, price/text()):col4
C(R2, /book):$b
C($b, title):col3
...
T(<prices>[col6]</prices>):R2
T(<book>[col7],[col8]</book>):col6
S(DXV):R3
(R3, /book/row):$row
Agg()
($row, title):col7
($row, price):col8
(x, y)[op():x[s]] op():y[s]
23
Cancel Out Example (2)
C($b, price/text()):col4
C(R2, /book):$b
C($b, title):col3
...
T(<prices>[col6]</prices>):R2
T(<book>[col7],[col8]</book>):col6
S(DXV):R3
(R3, /book/row):$row
Agg()
($row, title):col7
($row, price):col8
C($b, price/text()):col4
C($b, title):col3
...
T(<book>[col7],[col8]</book>):$b
S(DXV):R3
(R3, /book/row):$row
($row, title):col7
($row, price):col8
24
Cancel Out Example (3)
C($b, price/text()):col4
C($b, title):col3
...
T(<book>[col7],[col8]</book>):$b
S(DXV):R3
(R3, /book/row):$row
($row, title):col7
($row, price):col8
C($b, price/text()):col4
...
T(<book>[col7],[col8]</book>):$b
S(DXV):R3
(R3, /book/row):$row
($row, title):col3
($row, price):col8
25
Cancel Out Example (4)
C($b, price):temp1
...
T(<book>[col7],[col8]</book>):$b
S(DXV):R3
(R3, /book/row):$row
($row, title):col3
($row, price):col8
C(temp1, text()):col4
...
S(DXV):R3
(R3, /book/row):$row
($row, title):col3
($row, price):temp1
C(temp1, text()):col4
26
SQL Generation Find a pattern in the XAT Translate that pattern into a SQL
operator that will access the relational database.
27
SQL Generation Example...
S(DXV):R3
(R3, /book/row):$row
($row, title):col3
($row, price):temp1
C(temp1, text()):col4
...
SQL(select title as col3,
price as temp1 from book):{col3,temp}
C(temp1, text()):col4
28
Where are we? XAT Decorrelation. Optimization
XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.
Conclusion & Future Works.
29
XAT Data Model Cleanup By Default Each operator will append one
additional columns to the data model. Used to Help:
Execute: used to optimize the data storage during the execution
Cutting: get rid of the un-used operators in the XQuery
Equations for Data Model Cleanup Only keep the columns required by ancestors. DM := (DMp – Pp) Cp (P – C)
30
Data Model Examplefor $b in document("prices.xml") /booklet $prices := $b/pricereturn
$b
S(“prices.xml”):R1
(R1, /book):$b
Agg()
($b,):col1
C($b, price):$prices
1
2
3
4
5
Node
Produce Consume
DM before DM after
1 {} {} {$prices, R1, $b, col1}
{}
2 {col1} {$b} {$prices, R1, $b, col1}
{col1}
3 {$prices}
{$b} {$prices, R1, $b}
{$b, $prices}
4 {$b} {R1} {R1, $b} {$b}
5 {R1} {} {R1} {R1}
DM := (DMp – Pp) Cp (P – C)
31
Where are we? XAT Decorrelation. Optimization
XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.
Conclusion & Future Works.
32
XAT Cutting General Idea:
Get rid of the operators that’s produce useless data.
Equations: R := (Rp – P) C (P M) (Rp Mp) = NULL
33
XAT Cutting Example
R := (Rp – P) C
(P M) (Rp Mp)= NULL
for $b in document("prices.xml") /booklet $prices := $b/pricereturn
$b
S(“prices.xml”):R1
(R1, /book):$b
Agg()
($b,):col1
C($b, price):$prices
1
2
3
4
5
Node
Produce Consume
Modified
Required
Cut?
1 {} {} {*} {} N/A
2 {col1} {$b} {} {$b} {col1}
3 {$prices}
{$b} {} {$b} {}
4 {$b} {R1} {} {R1} {$b}
5 {R1} {} {} {} {R1}
34
Conclusions XQuery are heavily correlated,
hence need to be decorrelated for better optimization.
After Decorrelation, more optimization techniques can be applied: Computation Pushdown. Data Model Cleanup. Cutting.
35
Future Works Write TR to formalize the XAT.
Compare with ORDB, ODB, also XQA operators. Wrap Up:
Finalize uncertain operators deal with collections Union, Navigate
Formalize the Pushdown Rewriting Rules by Type (Reg. Exp. Type) Analysis
Finalize the XAT Rewriting Rules for: Order Handling Update propagation.
Translation from XAT back to Query Next Step:
Generate Search Space and Optimization Algorithm for XAT, ready for Schema Generation.