raindrop: an algebra-automata combined xquery engine over xml streams

28
Raindrop: An Algebra-Automata Combined XQuery E ngine over XML Streams Hong Su, Elke Rundensteiner, Murali Ma ni, Ming Li Worcester Polytechnic Institute Worcester, MA VLDB 2004

Upload: teleri

Post on 01-Feb-2016

63 views

Category:

Documents


0 download

DESCRIPTION

Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams. Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute Worcester, MA VLDB 2004. Stream Processing. data sources. Networks. data requesters. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Raindrop:

An Algebra-Automata Combined XQuery Engine over XML Streams

Hong Su, Elke Rundensteiner, Murali Mani, Ming Li

Worcester Polytechnic Institute

Worcester, MA

VLDB 2004

Page 2: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Stream Processingdata sources

data requesters

Networks

Page 3: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

What’s Special for XML Stream Processing

<auctions>

Token-by-Token access manner

timeline

Pattern retrieval + Filtering + Restructuring

FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>

Token: not a counterpart of a self-contained tuple

Pattern Retrieval on Token Streams

<auction>

<seller>

<primary>

<phone>

Page 4: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Two Computation Paradigms Automata-based [yfilter, xscan, xsm, xsq, xpush…] Algebraic [niagara00, …]

FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>

1auction

*

2

3seller

bidder

Automata

8Navigate

$a, /seller->$b

Navigate $a, /bidder-> $c

Tagger

Algebra

Navigate stream(bids),//auction->$a

4

homepage

9sameAddr

5 6* phone

7

bid

Page 5: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Comparison of Two Paradigms

Either paradigm has deficiencies

Both paradigms complement each other

Automata Paradigm Algebra Paradigm

Good for pattern retrieval on tokens Does not support token inputs

Need patches for filtering and restructuring

Good for filtering and restructuring

Present all details on same low level Support multiple descriptive levels (e.g., logical plan, physical plan)

Little studied as query processing paradigm

Well studied as query process paradigm

Page 6: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Four-Level Algebraic Framework

Semantics-Focused PlanSemantics-Focused Plan

Stream Physical PlanStream Physical Plan

Stream Execution PlanStream Execution Plan

Express the semantics of query regardless of

input sources

Accommodate tokenized streams/

automata computation

Describe implementation

details of operators

Decide how an operator is invoked

(scheduling) Abstraction Level

High (Declarative)

Low (Procedural)

Stream Logic PlanStream Logic Plan

This Raindrop framework intends to integrate both paradigms into one

Page 7: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Level I: Semantics-Focused Plan

Express query semantics regardless of stored or stream input sources [Rainbow-ZPR02]

Reuse existing general optimization techniques Decorrelation Cancel duplicate navigation operators …

Page 8: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Stream Data:Stream Data: <auctions> <auction> <seller> <primary><phone>508</phone></primary> <secondary><phone>613</phone></secondary> </seller> <bid><bidder>…</bidder><bidder>…</bidder></bid> </auction> …

source<auctions> … </auctions>

source<auctions>… </auctions>

$a<auction> … </auction>

<auctions> … </auctions>

<auction> … </auction>

source<auctions>… </auctions>

$a<auction>… </auction>

$b <seller>…

</seller>

<auctions>… </auctions>

<auction>… </auction>

source <auctions>…

</auctions>

$a<auction>… </auction>

$b <seller>…

</seller>

$c <bidder>…

</bidder>

<auctions>… </auctions>

<auction>. .. </auction>

NavUnneststream(bids),//auction->$a

NavUnnest $a, /seller ->$b

NavUnnest $a, /bid/bidder ->$c

Example Semantics-Focused Plan

Plan and Input/output Data:Plan and Input/output Data:

Query:Query:

FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>

Page 9: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Level II: Stream Logical Plan

Extend semantics-focused plan to accommodate tokenized stream inputs New input data format:

Tokens New operators:

StreamSource, TokenNavigate, ExtractUnnest, ExtractNest, StructuralJoin

New rewrite rules: Push-into/Pull-out-of Automata

Page 10: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

One Uniform Algebraic View

Token-based plan (automata plan)

Tuple-based plan

Tuple stream

XML data stream

Query answer

Algebraic Stream Logical Plan

Page 11: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Modeling Automata in Algebraic Plan:Black Box[XScan01] vs. White Box

$a := stream(bids)//auction$b := $a/seller$c := $a/bid/bidder

Black Box

XScan

StructuralJoin$a

ExtractUnnest $a, $b

ExtractUnnest $a, $c

White Box

TokenNavigate $a, /seller->$b

TokenNavigate $a, /bid/bidder->$c

TokenNavigate stream(bids), //auction->$a

FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bid/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>

Page 12: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Data Model in Algebraic Plan Modeling Automata

StructuralJoin$a

ExtractUnnest $a, $b

ExtractUnnest $a, $c

TokenNavigate $a, /seller->$b

TokenNavigate $a, /bid/bidder->$c

TokenNavigate stream(bids), //auction->$a

<phone>

<primary>

<seller>

<auction>

0314

<bidderid>

<bidder>

<bidder>...</bidder>

</primary>

</phone>

508

...

<phone>

<primary>

<seller>

<seller>…</seller>

……

<bidder>...</bidder><seller>…</seller>

....

<auction>

<auctions>

StreamSource

Page 13: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

For Details of Levels III and IV, please refer to “Automaton Meets Query Algebra: Towards a Unified Mo

del for XQuery Evaluation over XML Data Streams”, ER 2003

“Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams”, CIKM 2003

“Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams”, Journal Submission 2004

Page 14: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Optimization I: Computation Into or Out of Automata?

TokenNavigate $a, /bid/bi

dder->$c

ExtractUnnest $a, $c

ExtractUnnest $a, $b

StructuralJoin $a

TokenNavigate $a, /seller->$

b

TokenNavigate stream(bids), //a

uction->$a

ExtracUnnest stream(bids), $a

NavigateUnnest $a, /seller-

>$b

NavigateUnnest $a, /bid/bid

der->$c

TokenNavigate stream(bids), //aucti

on->$a

NavUnnest stream(bids), //auction->$a

NavigateUnnest $a, /seller ->$b

NavigateUnest $a, /bid/bidder ->$c

Out of Automata Into Automata

Automata Plan

Automata Plan

… …

Page 15: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Experimentation Results

Execution Time on 85M XML Stream Under Various Selectivity

25000

30000

35000

40000

45000

50000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Selectivity of Selection

Exe

cutio

n Ti

me

(ms)

1 Nav

2 Navs

3 Navs

4 Navs

5 Navs

Page 16: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Optimization II: Semantic Query Optimization

General schema-based optimizations Eliminate predicate/join, … Focus on operators manipulating flat values

XML specific schema-based optimizations Focus on pattern retrieval Fall into two categories

General XML SQO• Minimize query tree [YCL+-AT&T 01]

Stream XML SQO (our focus)

Page 17: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Stream-Specific XML SQO

Observations Pattern retrieval over tokens solely relies on docum

ent-order traversal Schema constraints help expedite document-order t

raversal State-of-the-Art

[XPush03] covers limited query (boolean XPath match) and one type of constraints

Our goals: Support more powerful query (XQuery) Support more types of constraints (XSchema)

Page 18: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Step I: Construct Query Graph

(a) Example Query (b) Query Tree

FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bid/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>

Page 19: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Example XML Schema

Page 20: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Step II: Apply Optimization Rules

Offer optimization rules utilizing occurrence constraints exclusive constraints order constraints

Apply rules in an order ensuring no beneficial rule missed no redundant rule introduced

Page 21: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Step III: Translate Rewritten Query Graph Back to Plan (I)

when </phone> is encountered twice, check /*/phone: if fails the predicate, suspend states s2 and s3

Utilize Occurrence Constraints

Page 22: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Step III: Translate Rewritten Query Graph Back to Plan (II)

when <billTo> or <shipTo> is encountered once: suspend states s2 and s9

Utilize Exclusive Constraints

Page 23: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Step III: Translate Rewritten Query Graph Back to Plan (III)

when <primary> is encountered once, check /homepage: if no presence, suspend states s10, s3 and s2

Utilize Order Constraints

Page 24: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

http://davis.wpi.edu/dsrg/raindrop/

[email protected]

Thank WPI DSRG Rainbow Team for XAT Algebra Support

Page 25: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams
Page 26: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams
Page 27: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams
Page 28: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Thank WPI DSRG Rainbow Team for XAT

Algebra Support