1 rainbow xml-query processing revisited: the complete story (part i) xin zhang

98
1 Rainbow XML-Query Processing Revisited: The Complete Story (Part I) Xin Zhang

Post on 22-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

1

Rainbow XML-Query Processing Revisited: The Complete Story (Part I)

Xin Zhang

2

Motivation Experience from Past Imp. Solid Foundation for Researches

Order-sensitive Query Processing. Brian

Update Computation Pushdown. Mukesh

Query Optimization. Brian & Brad Cost-based XML Storage. Xin

3

What are we going to do? XAT Data Model XAT Operators XAT Generation

4

<!ELEMENT prices (book*)> <!ELEMENT book (title, source, price)> <!ELEMENT title (#PCDATA)> <!ELEMENT source (#PCDATA)> <!ELEMENT price (#PCDATA)>

<prices> <book>

<title> TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price>

</book> <book>

<title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>

</book> <book>

<title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price>

</book> <book>

<title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>

</book> </prices>

Example* of XML Use Cases.

5

Example XQuery

<results> {

for $t in distinct (document("prices.xml") //book/title)

let $p := document("prices.xml") //book[title = $t]/price

return <minprice title= $t/text()>

<price> min($p/text()) </price> </minprice>

} </results>

In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element with the book title as its title attribute.

<results> <minprice title="TCP/IP Illustrated">

<price>65.95</price> </minprice> <minprice title="Data on the Web">

<price>34.95</price> </minprice>

</results>

6

Four Kinds of Data Models Object –Relational storage.

Special data type: sequence. Two kinds of Tables

Flat Table: Required ID assignment. Nested Table: Complicated operators.

Two kinds of Cells: References to DOM trees: Requires de-

referencing. Values: Waste space.

They are all interchangeable.

7

Data Model An Ordered Sensitive Table. Every cell has its own domain, e.g.:

SQL domains. XML node. A Collection.

Every column denotes one variable ($v) or an internal variable (coln,Rm).

Comparison are done by deep equal.

8

Do we need Schema Order? So far, all the operators doesn’t

require any schema order. Hence, we will not consider the

schema order in the data model.

9

Definition of Collection Collection has to have at least 2

objects. If try to generate a collection of one

object, the collection will be reduced into the object, and no collection will be generated.

Collection cannot be nested! Collection is an unnamed XML node.

10

Data Model Examples Table of XML

Fragments. Table Types:

Regular Relations. Table with XML nodes. Table with a collection of

XML nodes.

<price> 65.95</price>

title price

price

prices {<price> 34.95</price>,<price> 39.95</price>,...}

11

Column Names A relation column name

“price”, A generated column name.

“col1”, “r1”. A Variable Binding:

“$var”

12

Where are we? XAT Data Model XAT Operators XAT Generation

13

XML Operators (5+2)

OperatorOperator SymSym..

PrmsPrms..

OutpOutputut

DatDataa

DescriptionDescription

Tagger T p col s Taggering s according to list pattern p.

Navigate

col1, path

col2 s Navigate from column col of s through a XPath.

Aggregate Agg N/A N/A s Make a collection for each column.

Composer C p col s Construct a XML document from one s according to DOM pattern p.

XML Union X col+ col s Union multiple columns into one.

XML Intersect

X col+ col s Intersect multiple columns into one.

XML Difference

-X col+ col s Difference multiple columns into one.

14

Special Operators (7)OperatorOperator SySy

mmPrms.Prms. OutputOutput DataData DescriptionDescription

SQL SQL stmt col+ N/A One SQL query statement stmt over multiple s.

Function {F} param+ col s? XML or user defined function over zero or one data source with a list of parameters.

Source S desc col+ N/A Identify a data source by description desc. It could be a piece of XML fragments, an XML documents, or a relational table.

Name col1, col2ns

ss

Rename column col1 of source s into name col2.

name s into ns.

FOR FOR col+ s, sq FOR operator iterate over s and execute subquery sq with variable binding columns col1..n.

IF_THEN_ELSE

IF c sq1, sq2 If condition c is true, then execute subquery sq1, else execute subquery sq2.

Merge M s+ Merge multiple tables into one table.

15

SQL Operators (11)OperatorOperator Sym.Sym. Prms.Prms. OutpOutp

ututDataData DescriptionDescription

Project col+ N/A s Project out multiple columns from subquery s.

Select c N/A s Filter subquery s by condition c.

Cartesian Product

N/A N/A s1, s2 Cartesian product of the results of two sources, s1 and s2.

Theta Join c N/A ls, rs Join two sources ls and rs under condition c.

Outer Join

cc

N/AN/A

ls, rsls, rs

Left (right) outer join two sources ls and rs by condition c.

Groupby col+ N/A s, sqg Making temporary groups by multiple columns from source s, then evaluate subquery sqg for each group, then merge the evaluated results back.

Orderby col+ N/A s Sort source s by multiple columns.

Union N/A N/A s+ Union multiple sources together.

Outer Union O N/A N/A s+ Outer union multiple sources together.

Difference N/A N/A ls, rs Difference between two sources.

Intersect N/A N/A s+ Intersect multiple sources.

COp COp Col+ N/A s, sq Correlated Operator on columns col+. It will execute sq for each tuple in source s.

16

Functions (Examples) Ref: http://www.w3.org/TR/xquery-operators/

TypeType ExamplesExamples

String concat, contains, lowercase, name, starts-with, subst, trim, uppercase ...

Aggregation avg, count, max, min, sum, ...

Sequence exists ...

Date and Time

date ...

Context last, position ...

Node shallow ...

... ...

User Defined

The new function defined in the XQuery.

17

Expression

Used in Select and Join operators.

Arithmetic: negative, +, -, *, /,

%. Boolean:

NOT, OR, AND >, =, <, >=, <=,

<> Terminals:

String and Double Column Name

interfaceBinANDExpression

interfaceBinArithExpression

+PLUS:int+MINUS:int+MULTIPLE:int+DIVIDE:int+MOD:int

interfaceBinBoolExpression

interfaceBinCOMPExpression

+LT:int+GT:int+LEQ:int+GEQ:int+EQ:int+NEQ:int

interfaceBinExpression

left:Expression right:Expression

interfaceBinORExpression

Visitableinterface

Expression

+eval:Object

interfaceTerminalExpression

+STRING:int+DOUBLE:int+NAME:int

type:int

interfaceUniExpression

expression:Expression

interfaceUniMinusExpression

interfaceUniNOTExpression

18

Pattern for Tagger

List pattern only contains Strings and Column Names.

DOM pattern is a tree.

interfaceAttributeNode

tagValue:NavigationStep[]

interfaceColumnNameNode

tagValue:edu.wpi.cs.dsrg.xmldb.xat.common.operator.xmloperator.NavigationStep[]

Visitableinterface

DOMPatternNode

+addChild:void+addChild:void+deleteChild:void+getChild:DOMPatternNode+setChild:void+setTagValue:void+getTagValue:Object+setTagValue:void

children:DOMPatternNode[] parent:DOMPatternNode tagName:String canceledOut:boolean

interfaceRootNode

interfaceTagNode

interfaceTextNode

19

Where are we? XAT Data Model XAT Operators

XML (5): Tagger, Composer, Navigate, Aggregate, XML Union.

XAT Generation

20

Tagger Tpcol (s)

Consume: columns used in the pattern p. Produce: generate the new column col. Logic:

One additional column is added with tagged information.

Need to work with operator to create nested structure. Order Handling:

The tagged column is added to the end. The tuple order of the output table is same as table s.

Requirement: The columns used in pattern p should be in table s.

21

Example: T<price>[col1]</price>col2

Col1

65.95

34.95

Col1

<price>65.95</price>

<price>34.95</price>

22

Composer Cpcol(s)

Consume: columns used in the pattern p. Produce: generate the new column col with

nested structure. Logic:

Doesn’t require other operator to create the nested structure.

Order Handling: Tuple order is same as the input.

Requirement: Require a special schema for the input subquery s. (id[1..n], type, att[1..m], value)

23

Navigate col, pathcol’(s)

Consume: column col. Produce: new column col’. Logic:

One additional column is added with navigation information. Tuples are multiplied if there are more than one results in

the navigation. If the navigation result is empty, get rid of that tuple.

Order Handling: The navigation column is added to the end. The tuple order of the output table is same as table s and

the navigation order. Requirement: N/A

24

Two types of Navigates Navigate Unnesting:

Unnesting the parent-children relationship, and duplicates the parent values for each child.

Navigate Collection: Nesting the parent-children

relationship, create a collection of children, but keep the single parent.

25

Where to use two types Navigate Unnesting:

FOR binding. Navigate Collection:

LET binding.

26

Collections Issues in , 1)What happened if there already a

collection in input table? !Depends on the input table. If navigate

from the collections, see issue 2. If not, then same as the original collection.

2)What happened if navigate from a collection in the input table?

Then, generate another collection, but no nested collections.

27

Navigation Steps in the Navigate operator.

Attribute: @ Children: //, /child Text: text() Column Name: col1

28

Navigation Use Cases a(<a>...</a>) NULL b(<a><b>...</b></a>)

<b>...</b> a(<a><a>...</a></a>)

<a>...</a> text()(<a>text()</a>) text() a({<a/>,<b/>} <a/>

29

Example of R1, bookcol1

R1 Col1<prices>...</prices>

<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price>

</book>

<prices>...</prices>

<book> <title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>

</book>

<prices>...</prices>

<book> <title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price>

</book>

<prices>...</prices>

<book> <title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>

</book>

R1<prices>

<book> ...

</book>

<book> ...

</book>

<book> ...

</book>

<book> ...

</book> </prices>

30

Example of R1, bookcol1

R1 Col1<prices>

<book> ...

</book> <book>

...</book> <book>

...</book> <book>

...</book>

</prices>

{<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price>

</book> ,<book>

<title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>

</book> ,<book>

<title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price>

</book> ,<book>

<title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>

</book>}

R1<prices>

<book> ...

</book>

<book> ...

</book>

<book> ...

</book>

<book> ...

</book> </prices>

31

Example of col1, bookcol2

Col1

{<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</

source> <price>65.95</price>

</book> ,<book>

<title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>

</book> ,<book>

<title>Data on the Web</title> <source>www.amazon.com</

source> <price>34.95</price>

</book> ,<book>

<title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>

</book>}

Col1

col2

{...}

<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</

source> <price>65.95</price>

</book>

{...}

<book> <title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>

</book>

{...}

<book> <title> Data on the Web</title> <source>www.amazon.com</

source> <price>34.95</price>

</book>

{...}

<book> <title> Data on the Web </title> <source>www.bn.com</source> <price>39.95</price>

</book>

32

Example of col1, titlecol2

Col1

{<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</

source> <price>65.95</price>

</book> ,<book>

<title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>

</book> ,<book>

<title>Data on the Web</title> <source>www.amazon.com</

source> <price>34.95</price>

</book> ,<book>

<title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>

</book>}

Col1 col2

{<book> ...

</

book> ,<book> ...

</

book> ,<book> ...

</

book> ,<book> ...

</

book>}

{<title> TCP/IP Illustrated

</title> ,<title> TCP/IP Illustrated

</title> ,<title> Data on the Web

</title> ,<title> Data on the Web

</title> }

33

Aggregate Agg(s) Consume: nothing. Produce: nothing. Logic:

Create a collection for each column. Order Handling:

There is only one tuple. Requirement: N/A

34

Example of Agg(s)

Col1<book>

<title> TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price>

</book>

<book> <title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>

</book>

<book> <title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price>

</book>

<book> <title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>

</book>

Col1

{<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price>

</book> ,<book>

<title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>

</book> ,<book>

<title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price>

</book> ,<book>

<title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>

</book>}

35

XML Union X col[1..n] col(s)

Consume: columns col[1..n]. Produce: new column col. Logic:

For every tuple with col[1..n], merge their results into one collection and put it into the new column col.

Order Handling: N/A Requirement: N/A

36

Example: X title, priceresult(s)

title price Result<title> TCP/IP Illustrated </title>

<price>65.95</price> {<title> TCP/IP Illustrated

</title>,

<price>65.95</price>}<title> TCP/IP Illustrated </title>

<price>69.95</price> {<title> TCP/IP Illustrated

</title>,

<price>69.95</price>}<title>Data on the Web</title>

<price>34.95</price> {<title> Data on the Web

</title>,

<price>34.95</price>}<title>Data on the Web</title>

<price>39.95</price> {<title> Data on the Web

</title>,

<price>349.95</price>}

37

Where are we? XAT Data Model XAT Operators

XML (5) Special (7):SQL, Function, Source,

Name, FOR, IF, Merge. XAT Generation

38

SQL SQLstmtcol[1..m]

Consume: depends on the stmt. Produce: depends on the stmt. Logic:

Execute stmt over the multiple tables and output the result. It is assumed to be executed by a RDB engine. Usually, it’s the operator right above the source (e.g., table) operator.

Order Handling: The tuple order is un-decidable. The tuple order can

be reconfirmed by additional orderby node. Requirement: N/A.

39

Function Fparam[1..m] col(s?) Consume: columns used in the

param[1..m] Produce: new column col. Logic:

Execute XML or user defined function on the data sources.

Or used to represent a recursive query. Order Handling:

They can be reconfirmed by orderby nodes. Requirement: N/A.

40

Source sdesccol[1..n]

Consume: nothing Produce: new column col for XML sources; multiple

columns for Table source. Logic:

Identify following sources: view, XML document, XML fragment, or a table.

Col[1..n] depends on the source description. It will be one new column if the input is a XML source, otherwise, it will be a list of columns from the table source.

Order Handling: Depends on the implementation. Keep original tuple order as much as possible.

Requirement: N/A.

41

Example of S “prices.xml” R1

R1<prices>

<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price>

</book> <book>

<title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>

</book> <book>

<title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price>

</book> <book>

<title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>

</book> </prices>

42

Name Column col1, col2 (s)

Consume: Column col1. Produce: Column col2. Logic:

Rename col1 in table s into col2.

Order Handling: Keep all the schema and tuple orders.

Requirement: col1 in table s.

43

Name ns (s)

Consume: Nothing. Produce: Nothing. Logic: name s to ns. Order Handling:

Keep all the schema and tuple orders. Requirement: N/A.

44

FOR FORcol[1..n] (s, sq ) Consume: Nothing Produce: Nothing. Logic:

It’s a FOR iteration operator. For value in the columns col[1..n] of table s, evaluate the sub-query sq.

Very important for query decorrelation. Order Handling:

Schema order is decided by sq. Tuple order is similar to the join operator without the

left part. Requirement: N/A.

45

Merge M (s[1..n])

Consume: Nothing Produce: Nothing. Logic:

Merge multiple tables into one table. Tuple order is very important in this

operator. Order Handling:

Tuple order same as the input. Requirement:

s[1..n] have same number of tuples.

46

Example: M (title, price)

title price<title> TCP/IP Illustrated </title>

<price>65.95</price>

<title> TCP/IP Illustrated </title>

<price>69.95</price>

<title>Data on the Web</title>

<price>34.95</price>

<title>Data on the Web</title>

<price>39.95</price>

title<title> TCP/IP Illustrated </title>

<title> TCP/IP Illustrated </title>

<title>Data on the Web</title>

<title>Data on the Web</title>

price<price>65.95</price>

<price>69.95</price>

<price>34.95</price>

<price>39.95</price>

47

Where are we? XAT Data Model XAT Operators

XML Operators (5). Special Operators (7). SQL Operators (11): Project, Select,

Cartesian Product, Join (Theta, Outer), Groupby, Orderby, Union (Node, Outer), COp, Intersect, Difference.

XAT Generation

48

Project col[1..n] (s) Consume: All columns in DM(s). Produce: nothing. Logic: Keep only columns col[1..n] in DM(s). Order Handling:

Keep original tuple order, the schema order is reordered as the col[1..n] in the project operator.

Requirement: The col[1..n] should be in source s.

49

Select c(s) Consume: columns used in condition

expression c. Produce: nothing. Logic: Keep tuples in s when c is true. Order Handling:

Keep original tuple order, keep original schema order.

Requirement: Condition c should only reference to the

source s.

50

Theta Join c (ls, rs) Consume: columns in the condition c. Produce: nothing. Logic: Join ls and rs together under condition c. Order Handling:

The tuple order of the output table is iteration of tuples in rs over the iteration of tuples in ls, e.g., {<l1, r1>, <l1, r2>, <l2, r1>, <l2, r2>}

Requirement: Condition c should be relates to both tables ls and rs.

51

Left Outer Join c (ls, rs) Consume: Columns in the condition c. Produce: Nothing Logic: Join but keep all the tuples in ls. Order Handling:

The tuple order of the output table is iteration of tuples in rs over the iteration of tuples in ls, e.g., {<l1, r1>, <l1, r2>, <l2, null>, <l3, r1>, <l3, r3>}

Requirement: Condition c should be relates to both tables ls and rs.

52

Right Outer Join c (ls, rs) Consume: Columns in the condition c. Produce: Nothing. Logic: Join but keep all the tuples in rs. Order Handling:

The tuple order of the output table is iteration of tuples in ls over the iteration of tuples in rs, e.g.,{<null, r1>, <null, r2>, <l1, r1>, <l1, r2>, <l2, r1>, <l2, r3>}, “null” is at the beginning of the output.

Requirement: Condition c should be relates to both tables ls and rs.

53

Left Semi Join c (ls, rs) Consume: Columns in condition c. Produce: nothing. Logic: Join but only keep the columns in

ls. Order Handling:

The tuple order of the output table is same as table ls.

Requirement: Condition c should be relates to both tables

ls and rs.

54

Semi Join c (ls, rs) Consume: Columns used in condition c. Produce: nothing. Logic: Join but only keep the columns in

rs. Order Handling:

The tuple order of the output table is same as table rs.

Requirement: Condition c should be relates to both tables

ls and rs.

55

Groupby col[1..n] (s, sq) Consume: col[1..n] Produce: nothing. Logic:

Group the DM(s) by col[1..n], then apply sq on each group.

If the sq generates a table instead of one single value, the generated table will be treated as a collection.

Order Handling: The tuple order of the output table is same as table s.

Requirement: Col[1..n] should be in table s.

56

Orderby col[1..n] (s) Consume: col[1..n] Produce: nothing. Logic: Order s by col[1..n]. Order Handling:

The tuple order of the output table is as specified.

Requirement: Col[1..n] should be in table s.

57

Union (s[1..n]) Consume: nothing Produce: nothing Logic:

Same as SQL. Order Handling:

The tuple order of the output table is in the order of table s[1..n].

Requirement: All tables s[1..n] have same schema.

58

Outer Union O(s[1..n]) Consume: nothing Produce: nothing Logic:

Same as SQL. Order Handling:

The tuple order of the output table is in the order of table s[1..n].

Requirement: N/A.

59

Intersect (s[1..n]) Consume: nothing Produce: nothing Logic:

Same as SQL. Order Handling:

The tuple order of the output table is in the order of table s[1..n].

Requirement: All tables s[1..n] have same schema.

60

Difference (ls, rs) Consume: nothing Produce: nothing Logic:

Same as SQL. Order Handling:

The tuple order of the output table is in the order of table ls.

Requirement: Tables ls and rs have same schema.

61

Full set of Operators XML (5):

T, C, , Agg(), X Special (7):

SQL, F, S, , FOR, IF, M SQL (11):

, , , , , , , , , O, COp, , Syntax

Op<params><column_name>(<sub_queries>)

<column_name>:=Op(<params>) [<sub_queries>]

62

Where are we? XAT Data Model XAT Operators XAT Generation

63

Operator used in the Generation. XML:

T, , , Agg() Special:

F, S, FOR, IF SQL:

, , , , X, ,

64

How to translate FOR binding?

FOR $x IN for-binding

Inner-query use $xFOR($x)

$x IN For-binding

Inner-query use $x

65

How to translate LET binding?

LET $x := let-binding

Rest-of-query use $x

$x := let-binding

Rest-of-query use $x

66

What’s difference between FOR and LET bindings? XQuery

FOR $x IN document(“x.xml”)/x LET $x := document(“x.xml”)/x

XAT For-binding: R1, x

$x (s“x.xml”R1)

Let-binding: C R1,x col1(s“x.xml”R1)

67

XML Parser TreeQuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

<results> {

for $t in distinct (document("prices.xml") //book/title) let $p := document("prices.xml") //book[title = $t]/price return

<minprice title= $t/text()> <price> min($p/text()) </price>

</minprice> }

</results>

68

Parsed Tree (1)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

69

Parsed Tree (2)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

S “prices.xml” R1

70

Parsed Tree (3)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

Distinctcol1$t(R1, //book/title

col1(S“prices.xml” R1))

71

Parsed Tree (4)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

S”prices.xml”R2

Distinctcol1$t(R1, //book/title

col1(S“prices.xml” R1))

72

Parsed Tree (5)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

R2,//bookcol2(s”prices.xml”

R2)

Distinctcol1$t(R1, //book/title

col1(S“prices.xml” R1))

73

Parsed Tree (6)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

Distinctcol1$t(R1, //book/title

col1(S“prices.xml”R1))

titlecol3(R2,//book

col2(s”prices.xml”R2))

74

Parsed Tree (7)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

Distinctcol1$t(R1, //book/title

col1(S“prices.xml” R1))

$t, text()col4(col2,title

col3(R2,//bookcol2(s”prices.xml”

R2)))

75

Parsed Tree (8)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

Distinctcol1$t(R1, //book/title

col1(S“prices.xml” R1))

col3=col4($t, text()col4(col2,title

col3(R2,//bookcol2(s”prices.xml”

R2))))

76

Parsed Tree (9)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

Distinctcol1$t(R1, //book/title

col1(S“prices.xml”R1))

col2,price$p(

col3=col4(

$t, text()col4(col2,title

col3(R2,//bookcol2(s”prices.xml”

R2)))))

77

Parsed Tree (10)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

Distinctcol1$t(R1, //book/title

col1(S“prices.xml” R1))

$t, text()col5(

col2,price$p(

col3=col4(

$t, text()col4(

col2,titlecol3(R2,//book

col2(s”prices.xml”R2))))))

78

Parsed Tree (11)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

Distinctcol1$t(R1, //book/title

col1(S“prices.xml” R1))

Mincol6col7(

$p, text()col6 (

$t, text()col5(

col2,price$p(

col3=col4(

$t, text()col4(

col2,titlecol3(

R2,//bookcol2(s”prices.xml”

R2)))))))

79

Parsed Tree (12)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

Distinctcol1$t(R1, //book/title

col1(S“prices.xml” R1))

T<minprice title=[col5]><price>[col7]</price></miniprice>col8(

Mincol6col7(

$p, text()col6 (

$t, text()col5(

col2,price$p(

col3=col4(

$t, text()col4(

col2,titlecol3(

R2,//bookcol2(s”prices.xml”

R2))))))))

80

Parsed Tree (13)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

Agg(FOR$t(

Distinctcol1$t(R1, //book/title

col1(S“prices.xml” R1))),

T<minprice title=[col5]><price>[col7]</price></miniprice>col8(

Mincol6col7(

$p, text()col6 (

$t, text()col5(

col2,price$p(

col3=col4(

$t, text()col4(

col2,titlecol3(

R2,//bookcol2(s”prices.xml”

R2)))))))))

81

Parsed Tree (14)QuiltQuery(

ElementConstruct(<Results>,FLWRExpression(

Binding(ForBinding($t, distinct, Nav(

FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book), LocationStep(title))),

LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(

LocationStep(//), LocationStep(book,

BinOpComp(=,Nav(CurrentNode,

Steps(LocationStep(title))),

Nav(Var($t), Steps(Text())))),LocationStep(price))))),

ElementConstruct(<minprice>,AttributeExpression(@title,

Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,

FunMin(Nav(Var($p, Steps(Text())))))))))

T<results>col8</result>col9(

Agg(FOR$t(

Distinctcol1$t(R1, //book/title

col1(S“prices.xml” R1))),

T<minprice title=[col5]><price>[col7]</price></miniprice>col8(

Mincol6col7(

$p, text()col6 (

$t, text()col5(

col2,price$p(

col3=col4(

$t, text()col4(

col2,titlecol3(

R2,//bookcol2(s”prices.xml”

R2))

))

))

))

))

))

82

XAT ExampleT<results>col8</result>

col9(Agg(

FOR$t(

Distinctcol1$t(R1, //book/title

col1(S“prices.xml” R1))),

T<minprice title=[col5]><price>[col7]</price></miniprice>col8(

Mincol6col7(

$p, text()col6 (

$t, text()col5(

col2,price$p(

col3=col4(

$t, text()col4(

col2,titlecol3(

R2,//bookcol2(s”prices.xml”

R2))))))))))))

83

XAT Example (Graph)

Mincol6col7

$t, text()col5

$p, text()col6

s”prices.xml”R2

R2,//bookcol2

$t, text()col4

T<results>col8</result>col9

Distinctcol1$t

S“prices.xml” R1

R1, //book/titlecol1

col2,titlecol3

FOR$t

Agg

T<minprice title=[col5]><price>[col7]</price></miniprice>col8

col3=col4

col2,price$p

84

Discussion and Issues

85

Different Set of Operators After Parsing but before

Decorrelation With FOR, no /, no .

After Decorrelation With /, , , Distinct(), no FOR.

...

86

Equivalent Rewriting Rules Navigation Pushdown

Swap navigation operator down. Computation Pushdown

Swap SQL operator down. Groupby Operator Simplification

Pull functions (subqueries) out of Groupby function.

87

Issues (1) Use subquery or subquery result? Both. Do we really need cutting? Yes. Do we need Binding and Expose? Binding yes. Expose no: we

use navigate instead. Why we need to distinguish the Binding from Column Names?

Because binding used in multiple places and immutable, but column names used in one place.

Which data model is better? OR is better than R. Bag semantics or Set semantics? Bag Identify different set of operators at different stage? TBD Do we need the collection in the ORDBMS? Yes. What’s the type tree? Regular Expression Types. Better notation for the Algebra Syntax. It’s too complex. Do we

really need to define the new column name? Yes. Also, an XC (XML Calculus) is required. Can be directed from Datalog.

88

Issues (2) How to handle Union in the XQuery? Union will be

translated into XML Union. How do decorrelated the XQuery with Union? As usual.

Because, the union will not generate branches but only the linear tree.

How to translate XML Union (Intersect and Difference) into the SQL Union (Intersect and Difference)? TBD

Can we allow collection of collections? Looks like we don’t need that.

89

Entry Point Notation Format:

<relative forward part> : <entry point> Examples:

author.lastname:book, lastname:book.author, lastname:author:book (multi-level entry point)

Rules: author.lastname = /:author.lastname lastname:author.lastname = author.lastname text():lastname.text() = lastname.text()

90

Discussion of Entry Point/Column Name Entry Point is used to show the dependencies

between different navigations. XPERANTO use different column names to

distinguish between different navigations, because their sources are relations.

Niagara use Entry Point to get rid of tedious column names and make the algebra looks better, and also they are XML oriented.

We use column names with typing system. Because we have both source of relations and XML fragments, and also in the middle of the XAT, some operators might generate new columns.

91

Column Name and Nested Operators In most of the cases, we can get rid of

the column names by using the Nested Operators.

Well, the data model is used to separate the operators by the directly nesting, so that, optimization can be done easily.

Hence, we still need the column names instead of the nested operators to represent our algebra.

92

XML Calculus (XC) Idea of XC is from extending

Datalog. It can be used to prove the

correctness of the rewriting rules. It can also be used to help with

semantic analysis.

93

Type Tree To explain the type of each column name, in the

other words, the semantic of each column name.

It will be used by Navigation pushdown to decide the cancellation, order pushdown, and other rewriting rules that required the semantic checking.

It could be: XML type, a relational table, column, and function’s return type.

It has type with a list of column names of that type.

94

Naming a new Table/Column Name of the new table and column

should be unique.

95

How to translate multiple LET bindings? If the two let bindings from

different sources, For each let binding a collection is

generated. Until this is a FOR binding to

iterate through the collections, we just keep the two collections.

96

How to Handle Multiple FOR? That’s handled in the

Decorrelation. Keep this in mind:

FOR: means for each, it used . Hence, if there are multiple for, it results in a Cartesian product.

Others, navigate means creating a collection!

97

Trick in the XAT In XML Algebra, we use a evaluation

context, which is a sequence of XML nodes in a XML data model, which is a forest.

In Relational, we use a evaluation context, which is a list of tuples.

Hence, in the XAT generation, we try to convert the data model used by the XML into the data model used by OR.

That’s the tricky part!

98

Query Decorrelation for COp Top-down approach over XAT Tree. Approach:

Correlated Binding (CB) Op1[COp(CB, Op2)[Op3[Correlated

Operator[A],B]]] Op1[ROJ(CB)[Op2[Groupby(CB, Op3[]) [Operator[Cartesian[A,B]]]], B]]

For example: Correlated Join Outer Join with Groupby

with Cartesian