honey, i shrunk the xquery! —— an xml algebra optimization approach

37
WIDM 2002 DSRG, Worcester Polytechn ic Institute 1 Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke A. Rundensteiner

Upload: manton

Post on 30-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach. Xin Zhang, Bradford Pielech and Elke A. Rundensteiner. XML. Relational Database. Flexible and powerful way to: Represent data on the web Exchange data between applications. 1) Widely used to store business data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute

1

Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach

Xin Zhang, Bradford Pielechand

Elke A. Rundensteiner

Page 2: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 2

XML and Relational

XML

Flexible and powerful way to:

1) Represent data on the web

2) Exchange data between applications

Relational Database

1) Widely used to store business data

2) Efficient, reliable, secure3) Provides standard querying

(SQL)

The look and feel of an XML query system combined with the maturity and technology support of RDB

+

Page 3: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 3

Tuples

XA

T M

erger

SQL Generator

RDBMS

User XQuery

SQL

XA

T G

enerator

XAT Executor

User Query Results in XML

XAT Optimizer

XAT

XAT

View XQuery

XA

T D

ecorrelator

View XAT

User XAT

Architecture

XAT

XAT: XML Algebra Tree

Virtual XML DocumentVirtual XML DocumentVirtual XML Document

View XAT

User XAT

XAT

Virtual XML DocumentVirtual XML DocumentXML Document

Page 4: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute

4

GOAL: XQuery level optimization

Page 5: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 5

<results><title>TCP/IP Illustrated</title><title>Data on the Web</title>

</results>

Running Example

Data on the Web002

TCP/IP Illustrated001

TitleBid

34.95002

65.95001

PriceBid

<prices><row>

<bid>001</bid><price>65.95</price>

</row><row>

<bid>002</bid><price>34.95</price>

</row></prices>

</dxv>

<dxv><book>

<row><bid>001</bid><title>TCP/IP Illustrated</title>

</row><row>

<bid>002</bid><title>Data on the Web</title>

</row></book>

<result>FOR $t IN

document(“prices.xml”)/book/titleRETURN

$t</result>

<prices><book>

<title>TCP/IP Illustrated</title><price>65.95</price>

</book><book>

<title>Data on the Web</title><price>34.95</price>

</book></prices>

<prices>FOR $book IN document(“dxv.xml”)/book/row

$prices IN document(“dxv.xml”)/prices/rowWHERE $book/bid = $prices/bidRETURN

<book>$book/title,$prices/price

</book></prices>

Page 6: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 6

T<results>$t</result>col3

Agg

S”prices.xml”R0

R0, book/title$t

col31:

2:

3:

6:

7:

User Query

User XML Algebra Tree (XAT)

<result>FOR $t IN

document(“prices.xml”)/book/titleRETURN

$t</result>

XA

T M

erger

SQL Generator

User XQuery XA

T G

enerator

XAT Executor

XAT Optimizer

XAT

XAT

View XQuery

XA

T D

ecorrelator

XAT

View XAT

User XAT

XAT

View XAT

User XAT

Page 7: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 7

$book, titlecol10T<prices>col5</prices>

col4

S“dxv.xml” R1

R1, /book/row$book

Agg

T<book> [col10][col12] </book>col5

S“dxv.xml” R3

R3, /prices/row$prices

$prices, pricecol12

11:

12:

22:

23:

25:

14:

15:

20:

21:

31:

$book, bidcol6

$prices, bidcol7

27:

28:

col6=col726:

View Query

View XML Algebra Tree (XAT)

<prices>FOR $book IN document(“dxv.xml”)/book/row

$prices IN document(“dxv.xml”)/prices/rowWHERE $book/bid = $prices/bidRETURN

<book>$book/title,$prices/price

</book></prices>

XA

T M

erger

SQL Generator

User XQuery XA

T G

enerator

XAT Executor

XAT Optimizer

XAT

XAT

View XQuery

XA

T D

ecorrelator

XAT

View XAT

User XAT

XAT

View XAT

User XAT

Page 8: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 8

T<results>$t</result>col3

Agg

col4 R0

R0, book/title$t

col31:

2:

3:

6:

7:$book, title

col10

T<prices>col5</prices>col4

S“dxv.xml” R1

R1, /book/row$book

Agg

T<book> [col10][col12] </book>col5

S“dxv.xml” R3

R3, /prices/row$prices

$prices, pricecol12

11:

12:

22:

23:

25:

14:

15:

20:

21:

31:

$book, bidcol6

$prices, bidcol7

27:

28:

col6=col726:

User QueryView Query

Merged XML Algebra Tree (XAT)

XA

T M

erger

SQL Generator

User XQuery XA

T G

enerator

XAT Executor

XAT Optimizer

XAT

XAT

View XQuery

XA

T D

ecorrelator

XAT

View XAT

User XAT

XAT

View XAT

User XAT

Page 9: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 9

Outline XAT Optimization:

XAT Rewrite XAT Cleanup

Preliminary Evaluation Related Work Summary

Page 10: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 10

XAT Rewrite Query Optimization at Logic Level. Goal:

Redundancy Elimination. Computation Pushdown.

Technique: Equivalence Rewrite Rules. Heuristics:

Pushdown Navigates Remove Construction of Intermediate Result Combine Multiple Operators.

XA

T M

erger

SQL Generator

User XQuery XA

T G

enerator

XAT Executor

XAT Optimizer

XAT

XAT

View XQuery

XA

T D

ecorrelator

XAT

View XAT

User XAT

XAT

View XAT

User XAT

Page 11: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 11

T<results>$t</result>col3

Agg

col4 R0

R0, book/title$t

col31:

2:

3:

6:

7: $book, titlecol10

T<prices>col5</prices>col4

S“dxv.xml” R1

R1, /book/row$book

Agg

T<book> [col10][col12] </book>col5

S“dxv.xml” R3

R3, /prices/row$prices

$prices, pricecol12

11:

12:

22:

23:

25:

14:

15:

20:

21:

31:

$book, bidcol6

$prices, bidcol7

27:

28:

col6=col726:

User Query View Query

Before Navigation Pushdown

Page 12: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 12

31:

$book, bidcol6

27:

R1, /book/row$book14:

S“dxv.xml” R115:

$book, titlecol1023:

$prices, bidcol7

28:

R3, /prices/row$prices20:

S“dxv.xml” R321:

$prices, pricecol12

25:

T<results>$t</result>col3

Agg

col31:

2:

3:

R0, book/title$t

6:

col6=col726:

T<prices>col5</prices>R011:

Agg

12:

T<book> [col10][col12] </book>col522:

After Navigation PushdownView QueryUser Query

Page 13: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 13

After Tagger Cancel Out

JOIN col6=col731:

$book, bidcol6

27:

R1, /book/row$book14:

S“dxv.xml” R115:

$book, title$t23:

$prices, bidcol7

28:

R3, /prices/row$prices20:

S“dxv.xml” R321:

$prices, pricecol12

25:

col31:

T<results>$t</result>col32:

Agg3:

View QueryUser Query

Page 14: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 14

Outline XAT Optimization

XAT Rewrite XAT Cleanup

Preliminary Evaluation Related Work Summary

Page 15: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 15

XAT Cleanup Why:

SQL engine cannot reduce redundancy in XQuery.

How: Data Redundancy by Schema Cleanup

Each operator produced, consumed and modified some columns.

Minimum schema is then computed. Tree Redundancy by Unused Operator Cutting

Cutting matrix generation. Required columns analysis. Operator cutting.

XA

T M

erger

SQL Generator

User XQuery XA

T G

enerator

XAT Executor

XAT Optimizer

XAT

XAT

View XQuery

XA

T D

ecorrelator

XAT

View XAT

User XAT

XAT

View XAT

User XAT

Page 16: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 16

XAT Operator Properties Produced

Desc: New column generated by operator. Example: , S, T

Consumed Desc: Columns required by operator. Example: ,

Modified Desc: Columns modified by operator. Example: , ,

Page 17: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 17

Schema Computation

{R3}{}{R3}2021

{R3, $prices}{R3}{$prices}2820

{R3, $prices, col7}{$prices}{col7}2528

{R3, $prices, col7, col12}{$prices}{col12}3125

{R1}{}{R1}1415

{R1, $book}{R1}{$book}2714

{R1, $book, col6}{$book}{col6}2327

{R1, $book, col6, $t}{$book}{$t}3123

{R1, $book, col6, $t, R3, $prices, col7, col12}

{col6, col7}

{}331

{R1, $book, col6, $t, R3, $prices, col7, col12}

{}{}23

{col3, R1, $book, col6, $t, R3, $prices, col7, col12}

{$t}{col3}12

{col3}{col3}{}1

Old SchemaConsumedProducedParentNode

$book, title$t

S“dxv.xml” R1

R1, /book/row$book

col6=col7

S“dxv.xml” R3

R3, /prices/row$prices

$book, bidcol6

$prices, bidcol7

$prices, pricecol12

T<results>$t</result>col3

Agg

col3

27:

28:

14:

15:

20:

21:

31:

23:25:

1:

2:

3:

Page 18: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 18

Schema Computation

{R3}P2021

{$prices}CP2820

{$prices, col7}

CP2528

{col7, col12}

CP3125

{R1}P1415

{$book}CP2714

{$book, col6}

CP2327

{col6, $t}CP3123

{$t}CC331*

{$t}23

{col3}CP12

{col3}C1

New SchemaR3$pricescol12R1$bookcol7col6$tcol3Parent()#

*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.

Intuition: Don’t keep anything that’s not used later.

$book, title$t

S“dxv.xml” R1

R1, /book/row$book

col6=col7

S“dxv.xml” R3

R3, /prices/row$prices

$book, bidcol6

$prices, bidcol7

$prices, pricecol12

T<results>$t</result>col3

Agg

col3

27:

28:

14:

15:

20:

21:

31:

23:25:

1:

2:

3:

Page 19: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 19

Schema Cleanup ResultNode

Original Schema Minimum Schema

1 {col3, R1, $book, col6, $t, R3, $prices, col7, col12}

{col3}

2 {col3, R1, $book, col6, $t, R3, $prices, col7, col12}

{col3}

3 {R1, $book, col6, $t, R3, $prices, col7, col12}

{$t}

31 {R1, $book, col6, $t, R3, $prices, col7, col12}

{$t}

23 {R1, $book, col6, $t} {col6, $t}

27 {R1, $book, col6} {$book, col6}

14 {R1, $book} {$book}

15 {R1} {R1}

25 {R3, $prices, col7, col12} {col7, col12}

28 {R3, $prices, col7} {$prices, col7}

20 {R3, $prices} {$prices}

21 {R3} {R3}

Page 20: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 20

XAT Cleanup Schema Cleanup

Each operator produced, consumed and modified some columns.

Minimum schema is then computed. Unused Operator Cutting

Cutting matrix generation. Required columns analysis. Operator cutting.

Page 21: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 21

Cutting Matrix Purpose:

Get rid of the unused operators. Equations:

Propagation of modified Propagation of required

Identify cuttable node.

Page 22: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 22

Matrix Computation

# Parent()

col3

$t

col6

col7

$book

R1

col12

$prices

R3

Cut?

1 C

2 1 P C

3 2 - - - - - - - - -

31*

3 C C

23 31 P C

27 23 P C

14 27 P C

15 14 P

25 31 P C

28 25 P C

20 28 P C

21 20 P*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.

$book, title$t

S“dxv.xml” R1

R1, /book/row$book

JOIN col6=col7

S“dxv.xml” R3

R3, /prices/row$prices

$book, bidcol6

$prices, bidcol7

$prices, pricecol12

T<results>$t</result>col3

Agg

col3

27:

28:

14:

15:

20:

21:

31:

23:25:

1:

2:

3:

Page 23: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 23

Matrix Computation (Cont.1)

P2021

CP2820

CP2528

CP3125

P1415

CP2714

CP2327

CP3123

CC331*

-------M-23

CP12

RRRR1

Cut?R3$pricescol12R1$bookcol7col6$tcol3Parent()#

*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.

$book, title$t

S“dxv.xml” R1

R1, /book/row$book

JOIN col6=col7

S“dxv.xml” R3

R3, /prices/row$prices

$book, bidcol6

$prices, bidcol7

$prices, pricecol12

T<results>$t</result>col3

Agg

col3

27:

28:

14:

15:

20:

21:

31:

23:25:

1:

2:

3:

Intuition: Give me only the required columns in order to get the final result.

Page 24: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 24

Matrix Computation (Cont. 2)

# Parent()

col3

$t

col6

col7

$book

R1

col12

$prices

R3

Cut?

1 R R R R

2 1 P C

3 2 - M - - - - - - -

31*

3 C C X

23 31 P C

27 23 P C X

14 27 P C

15 14 P

25 31 P C X

28 25 P C X

20 28 P C X

21 20 P X*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.

$book, title$t

S“dxv.xml” R1

R1, /book/row$book

JOIN col6=col7

S“dxv.xml” R3

R3, /prices/row$prices

$book, bidcol6

$prices, bidcol7

$prices, pricecol12

T<results>$t</result>col3

Agg

col3

27:

28:

14:

15:

20:

21:

31:

23:25:

1:

2:

3:

Page 25: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 25

XAT after Cutting

$book, title$t

S“dxv.xml” R1

R1, /book/row$book

Agg

col3

14:

15:

23:

1:

3:

T<results>$t</result>col32:

$book, title$t

S“dxv.xml” R1

R1, /book/row$book

JOIN col6=col7

S“dxv.xml” R3

R3, /prices/row$prices

$book, bidcol6

$prices, bidcol7

$prices, pricecol12

T<results>$t</result>col3

Agg

col3

27:

28:

14:

15:

20:

21:

31:

23:25:

1:

2:

3:

Reduced To

Page 26: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 26

SQL Generated

$book, title$t

S“dxv.xml” R1

R1, /book/row$book

Agg

col3

14:

15:

23:

1:

3:

T<results>$t</result>col32:

$book, title$t

S“dxv.xml” R1

R1, /book/row$book

JOIN col6=col7

S“dxv.xml” R3

R3, /prices/row$prices

$book, bidcol6

$prices, bidcol7

$prices, pricecol12

T<results>$t</result>col3

Agg

col3

27: 28:

14:

15:

20:

21:

31:

23: 25:

1:

2:

3:

SELECT “$book”.title as “$t”, “$book”.bid as “col6”,“$prices”.price as “col12”,“$prices”.bid as “col7”

FROM book “$book”,prices “$prices”

WHERE “col6”=“col7”

SELECT “$book”.title as “$t”, FROM book “$book”,

XA

T M

erger

SQL Generator

User XQuery XA

T G

enerator

XAT Executor

XAT Optimizer

XAT

XAT

View XQuery

XA

T D

ecorrelator

XAT

View XAT

User XAT

XAT

View XAT

User XAT

Page 27: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 27

Outline XAT Optimization

XAT Rewrite XAT Cleanup

Preliminary Evaluation Related Work Summary

Page 28: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 28

Preliminary Evaluation Experiment Setup

XQuery over Kweelt Parser PIII800 256 MB, Win 2k Pro.

Data Setup Synthetic Data Synthetic Queries

Query Execution Native XML Engine.

Page 29: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 29

Performance Gain in Execution

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

10 100 1,000 10,000

# of Elements in XML dataset

Tim

e (

ms

)

None Rewrite Cleanup Rewrite+Cleanup

Page 30: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 30

Query Engine Overhead

1%42%

2%

55%

Generation(ms)

Rewrite(ms)

Decorrelation(ms)

Cleanup(ms)

XA

T M

erger

SQL Generator

User XQuery

XA

T G

enerator

XAT Executor

XAT Optimizer

XAT

XAT

View XQuery

XA

T D

ecorrelator

XAT

View XAT

User XAT

XAT

View XAT

User XAT XAT

Rewrite

XAT Cleanup

Total:32,522 ms

Page 31: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 31

Outline XAT Optimization

XAT Rewrite XAT Cleanup

Preliminary Evaluation Related Work Summary

Page 32: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 32

Related Work Rainbow:

Optimize on XAT. (static analysis) Algebra level rewriting.

SQL Optimization Algebra based optimization. Static analysis.

XQuery by Views: Optimize in SQL. XPERANTO[VLDBJ2000]: XQGM vs. XAT

Extension by UDFs for XML features. SilkRoute[IEEE2001(24:2)]:

Generate SQL Efficiently. AGORA[VLDB2000]:

Syntax level rewriting.

Page 33: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 33

Summary Efficient XQuery Processing XML Algebra Tree (XAT) XAT Optimization:

Rewrite by using equivalent rules Cleanup

Schema cleanup Operator cutting

Prototype system implementation.

Page 34: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute

34

Questions?(Futures!)

http://davis.wpi.edu/dsrg/rainbowhttps://sourceforge

.net/projects/rainbow-engine/

Special Thanks:Brian Murphy, Luping Ding, DSRG group.

Page 35: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 35

XA

T M

erger

SQL Generator

User XQuery XA

T G

enerator

XAT Executor

XAT Optimizer

XAT

XAT

View XQuery

XA

T D

ecorrelator

XAT

View XAT

User XAT

XAT

View XAT

User XAT

Page 36: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 36

Schema ComputationNode

Parent

Produced

Consumed

Minimum Schema

1 {} {col3} {col3}

2 1 {col3} {$t} {col3}

3 2 {} {} {$t}

31 3 {} {col6, col7}

{$t}

23 31 {$t} {$book} {col6, $t}

27 23 {col6} {$book} {$book, col6}

14 27 {$book}

{R1} {$book}

15 14 {R1} {} {R1}

25 31 {col12} {$prices} {col7, col12}

28 25 {col7} {$prices} {$prices, col7}

20 28 {$prices}

{R3} {$prices}

21 20 {R3} {} {R3}

$book, title$t

S“dxv.xml” R1

R1, /book/row$book

col6=col7

S“dxv.xml” R3

R3, /prices/row$prices

$book, bidcol6

$prices, bidcol7

$prices, pricecol12

T<results>$t</result>col3

Agg

col3

27:28:

14:

15:

20:

21:

31:

23: 25:

1:

2:

3:

Page 37: Honey, I Shrunk the XQuery!  —— An XML Algebra Optimization Approach

WIDM 2002 DSRG, Worcester Polytechnic Institute 37

col31:

T<results>$t</result>col32:

Agg3:

col6=col726:

After Tagger Cancel Out

31:

$book, bidcol6

27:

R1, /book/row$book14:

S“dxv.xml” R115:

$book, title$t23:

$prices, bidcol7

28:

R3, /prices/row$prices20:

S“dxv.xml” R321:

$prices, pricecol12

25:

View QueryUser Query