answering tree pattern queries using views

39
Answering Tree Pattern Queries Using Views Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC Amazon.com

Upload: jaimin

Post on 17-Jan-2016

44 views

Category:

Documents


1 download

DESCRIPTION

Answering Tree Pattern Queries Using Views. Laks V.S. Lakshmanan , Hui (Wendy) Wang , and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC Amazon.com. Outline. Motivation Problems Studied Without schema With schema Recursive schemas Related Work - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Answering Tree Pattern Queries Using Views

Answering Tree Pattern Queries Using Views

Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao

University of British ColumbiaVancouver, BC

Amazon.com

Page 2: Answering Tree Pattern Queries Using Views

Outline

Motivation Problems Studied Without schema With schema Recursive schemas Related Work Summary & Future Work

Page 3: Answering Tree Pattern Queries Using Views

Motivation 1/3

Integration of existing data sources. Local as view (LAV) – one of the well-

known approaches. Each source = a materialized view over

some global database. Answer to query over global DB =

answer to query using (materialized) views.

Page 4: Answering Tree Pattern Queries Using Views

Motivation 2/3 <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>

Source = View “//Trials//Trial” over some DB containing clinical data – trials, their status, patient data, etc.

Consider query Q: //Trials[//Status]//Trial over [unknown] original DB.

How can and should we answer it using above source?

Page 5: Answering Tree Pattern Queries Using Views

Motivation 3/3<PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab>

<Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>

//Trials//Trial//Trials//Trial?? ??

On

e p

ossib

le o

rig

inal D

B

Page 6: Answering Tree Pattern Queries Using Views

Motivation 3/3<PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> …

</Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab>

<Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>

//Trials//Trial//Trials//Trial?? ??

On

e p

ossib

le o

rig

inal D

BQ: //Trials[//Status]//Trial

Page 7: Answering Tree Pattern Queries Using Views

Motivation 3/3<PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab>

<Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>

//Trials//Trial//Trials//Trial

On

e p

ossib

le o

rig

inal D

B

◦ “●◦ “●[//Status]”[//Status]” { (3) }

Contained rewriting

Page 8: Answering Tree Pattern Queries Using Views

Problems Studied 1/3 Equivalent Rewriting: Given Q and

views V, find an equivalent rewriting of Q using V, i.e., an expression E s.t. V◦E Ξ Q, over all possible input DBs. Appropriate for query optimization.

Contained Rewriting: Given Q and V, find an expression E s.t. V◦E Q overall all possible input DBs, and V◦E is maximal among all such rewritings. Most appropriate for information integration

[Halevy, Lenzerini, Pottinger & Halevy].

Page 9: Answering Tree Pattern Queries Using Views

Problems Studied 2/3No Schema: Given Q and V, find a

maximally contained rewriting (MCR) of Q using V.

With Schema: Given Q and V, and a schema prescribing possible input DBs, find a maximally contained rewriting of Q using V.

Focus: Tree Pattern Queries (XP/,//, [ ]). Schema without cycles, union, and

recursion.

Page 10: Answering Tree Pattern Queries Using Views

Problems Studied 3/3

Given Q & V: R Ξ V ◦ E Q.

Compensation queryRewriting query

Want MCR in the absence and in the presence of a schema.

//a[//b]/c //a

b c

Page 11: Answering Tree Pattern Queries Using Views

Without Schema 1/6

Question 1: Does an MCR always exist?

/a/a

bb

cc

VV

/b/b

d

Q1Q1

/a/a

b

dd

Q2Q2

No MCR for Q1 and for Q2.

What went wrong?

distinguished (answer) node

Page 12: Answering Tree Pattern Queries Using Views

Without Schema 2/6

Trial

//Trials//Trials //Trials//Trials

StatusStatus Patient

(1)(1)

(2)(2) (3)(3)Trial

//Trials//Trials

StatusStatus Patient

(1)(1)

(2)(2) (3)(3)

Unfulfilled obligationsClip Away Tree (CAT)

f

f – useful embedding

VVQQ

V

E

Page 13: Answering Tree Pattern Queries Using Views

Without Schema 3/6Theorem: Q, V – tree pattern queries.

Then Q is answerable using V iff there is a useful embedding from Q to V.

aa

dd

baa

//a//a

c c

bba a aa

b

cc

2

1

77

33

44

55V ee

bb

ccQ

1,21,2

1:{2}, 2:{}1:{2}, 2:{}

2:{6}2:{6}

6:{7}6:{7}

1:{2,3}, 2:{3}1:{2,3}, 2:{3}

2:{6}, 3:{4} 2:{6}, 3:{4}

4:{5}, 6:{7} 4:{5}, 6:{7}

Testing Existence of MCR:

//a//a

6

Page 14: Answering Tree Pattern Queries Using Views

Without Schema 4/6

Two embeddings – corresponding irredundant CRs.

aa

baa

cb

cc

//a//a

d e

aa

baa

cb

cc

//a//a

a eb

c d

need for expressing

MCR!

Page 15: Answering Tree Pattern Queries Using Views

Without Schema 5/6Can test existence of MCR in poly time. However, MCRs can be exponentially

large (closure issue).

eedd

aa aa

c

bb

//a//a

aa

//a//a

c

bb

c

bbVVQQ

How many irredundant CRs

are possible?

Page 16: Answering Tree Pattern Queries Using Views

Without Schema 5/6

eedd

aa aa

c

bb

//a//a

aa

//a//a

c

bb

c

bbVV

QQc

bb

aa

//a//a

d e

Page 17: Answering Tree Pattern Queries Using Views

Without Schema 5/6

eedd

aa aa

c

bb

//a//a

aa

//a//a

c

bb

c

bbVV

QQc

bb

aa

//a//a

d

a/b/c/e

Page 18: Answering Tree Pattern Queries Using Views

Without Schema 5/6

eedd

aa aa

c

bb

//a//a

aa

//a//a

c

bb

c

bbVV

QQc

bb

aa

//a//a

ea/b

e

c

Page 19: Answering Tree Pattern Queries Using Views

Without Schema 5/6

eedd

aa aa

c

bb

//a//a

aa

//a//a

c

bb

c

bbVV

QQc

bb

aa

//a//a

a/b/c/ea/b

e

cMCR = union of exponential

# CRs in the worst case!

Page 20: Answering Tree Pattern Queries Using Views

Without Schema 6/6

Summary: Can test existence of MCR in poly

time. Exact characterization.

MCR may be union of exponentially many CRs in the worst case.

Algorithm for generating MCR.

Page 21: Answering Tree Pattern Queries Using Views

With Schema 1/6

Given Query Q, view V, schema S. Infer all constraints C implied by S. Chase V w.r.t. C. Look for MCR of Q w.r.t. chased view.

Page 22: Answering Tree Pattern Queries Using Views

With Schema 2/6AuctionsAuctions

AuctionAuction

open_auctionopen_auctionclosed_auctionclosed_auction

bidsbids

personperson itemitem

namename

**

** ??

++ ??

++ ++

E.g. constraints:•c_a has ≤ 1 bids

child•Every Auction

having a person desc also

has an item desc.

•every path from Auction to name

goes via bids.

Page 23: Answering Tree Pattern Queries Using Views

With Schema 3/6

//Auction//Auction

o_ao_a c_ac_a

bidsbids bidsbids

VV

//Auction//Auction

bidsbids bidsbids

personperson itemitem

namenameQQ

Page 24: Answering Tree Pattern Queries Using Views

With Schema 3/6AuctionsAuctions

AuctionAuction

open_auctionopen_auction closed_auctionclosed_auction

bidsbids

personperson itemitem

namename

**

** ??

++ ??

++ ++

o_ao_a c_ac_a

bidsbids bidsbids

//Auction//Auction

person item

name

p i

n

Page 25: Answering Tree Pattern Queries Using Views

With Schema 4/6

o_ao_a c_ac_a

bidsbids bidsbids

//Auction//Auction

person item

name

p i

n

//Auction//Auction

bidsbids bidsbids

personperson itemitem

namenameQQ

MCR = identity query.

Page 26: Answering Tree Pattern Queries Using Views

With Schema 5/6Another Example: AuctionsAuctions

AuctionAuction

closed_auctionclosed_auction

bidsbids

personperson

itemitem

namename

**

** ??

++

++

open_auctionopen_auction

buyer?

//Auction

nameitem

person

Q

//Auction

VHow to answer Q using V?

Page 27: Answering Tree Pattern Queries Using Views

With Schema 5/6 Another Example: AuctionsAuctions

AuctionAuction

closed_auctionclosed_auction

bidsbids

personperson

itemitem

name

**

** ??

++

++

open_auctionopen_auction

buyer?

//Auction

nameitem

person

Q

//Auction

namnamee

item

So what’s the compensation query?

Page 28: Answering Tree Pattern Queries Using Views

With Schema 5/6 Another Example: AuctionsAuctions

AuctionAuction

closed_auctionclosed_auction

bidsbids

personperson

itemitem

name

**

** ??

++

++

open_auctionopen_auction

buyer?

//Auction

nameitem

person

Q

//Auction

namename

item

MCR = V ◦ “●//name”

Page 29: Answering Tree Pattern Queries Using Views

With Schema 6/6

Challenges and Highlights: Naïve chase can explode.

Make chase context aware.

Exact characterization of schema w/o recursion and union in terms of constraints.

Efficient algo. for inferring the constraints. Efficient algo. for chase. And for finding MCR. MCR is unique, if it exists.

Page 30: Answering Tree Pattern Queries Using Views

Recursive Schemas 1/2a

b

c d**

?

//a

b

V

//a

b b

c dQ

What is the MCR?

Page 31: Answering Tree Pattern Queries Using Views

Recursive Schemas 2/2a

b

c d**

?

//a

b

V

//a

b b

c dQ//a

b

c d

Page 32: Answering Tree Pattern Queries Using Views

Recursive Schemas 2/2a

b

c d**

?

//a

b

V

//a

b b

c dQ//a

b

c

d

b

Page 33: Answering Tree Pattern Queries Using Views

Recursive Schemas 2/2a

b

c d**

?

//a

b

V

//a

b b

c dQ//a

b

c

db

Page 34: Answering Tree Pattern Queries Using Views

Recursive Schemas 2/2a

b

c d**

?

//a

b

V

//a

b b

c dQ//a

b

c

b

d

b

MCR = union of four CRs.

Behavior similar to no schema.

Page 35: Answering Tree Pattern Queries Using Views

Related Work 1/2

QAV for relational – huge body of work [Halevy 01].

Regular path queries and semi-structured DBs [Grahne&Thomo 03, Calvenese 00,Papakonstantinou&Vassalos 99].

Equivalent rewrites for fragments of XQuery and XPath [Deutsch&Tannen 03, Tang&Zhou 05, Xu&Ozsoyoglu 05].

Page 36: Answering Tree Pattern Queries Using Views

Related Work 2/2 Key differences b/w equivalent &

contained rewriting: Unique rewriting (even w/o schema). MCR may involve union of (possibly

exponentially many) CRs. Study of contained rewriting in

presence of schema. Lot of work on semantic caching

[Chen+ 02], heuristics for using materialized views for optimizing XPath [Balmin+ 04], mine views worth materializing, XPath containment, … .

Page 37: Answering Tree Pattern Queries Using Views

Summary & Future Work 1/2

QAV using (maximally) contained rewriting ( information integration).

Without schema: existence, characterization, closure, generation of MCR.

With Schema: extract essence using constraints, chase, similar problems as above.

Impact of recursion. Experiments.

Page 38: Answering Tree Pattern Queries Using Views

Summary & Future Work 2/2

Impact of wildcard, disjunction, order …

Impact of union, recursion, … Other integration models (e.g., GLAV) QAV for XQuery.

Page 39: Answering Tree Pattern Queries Using Views