schema-based scheduling of event processors and buffer minimization for queries on structured data...

34
Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams Bernhard Stegmaier (TU München) Joint work with Christoph Koch (TU Wien) Stefanie Scherzinger (TU Wien) Nicole Schweikardt (HU Berlin)

Upload: jorge-crawshaw

Post on 15-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams

Bernhard Stegmaier (TU München)

Joint work with Christoph Koch (TU Wien) Stefanie Scherzinger (TU Wien) Nicole Schweikardt (HU Berlin)

FluX – Intl. Conf. on Very Large Databases 2004 2

Outline Motivation

FluX Query Language Translating XQuery into FluX Further Aspects

Experiments

Conclusion

FluX – Intl. Conf. on Very Large Databases 2004 3

Traditional ApproachBibliography DTD<!ELEMENT bib (book)*><!ELEMENT book ((title|author)*,price)>

List title(s) and authors of books<results> {for $b in /bib/book return <result> {$b/title} {$b/author} </result>}</results>

Evaluation of book-node1. Print <result>2. Buffer titles and authors3. Output titles4. Output authors5. Print </result>

… <book> <author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> <price>40€</price> </book> …

Example: Buffer: <author>Kemper</author><title>Datenbanksysteme</title><author>Eickler</author>

Output: <result><title>Datenbanksysteme</title><author>Kemper</author><author>Eickler</author></result>

FluX – Intl. Conf. on Very Large Databases 2004 4

The FluX ApproachBibliography DTD<!ELEMENT bib (book)*><!ELEMENT book ((title|author)*,price)>

List title(s) and authors of books<results> {for $b in /bib/book return <result> {$b/title} {$b/author} </result>}</results>

FluX query (for book node)… <result> {process-stream $b: on title as $t return $t; on-first past (title,author) return {for $a in $b/author return $a}} </result> …

… <book> <author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> <price>40€</price> </book> …

Example: Buffer: <author>Kemper</author><author>Eickler</author>

Output: <result><title>Datenbanksysteme</title><author>Kemper</author><author>Eickler</author></result>

Less buffering usingorder constraints

FluX – Intl. Conf. on Very Large Databases 2004 5

The FluX Approach IIBibliography DTD<!ELEMENT bib (book)*><!ELEMENT book ((title*,author*),price)>

List title(s) and authors of books<results> {for $b in /bib/book return <result> {$b/title} {$b/author} </result>}</results>

FluX query… <result> {process-stream $b: on title as $t return $t; on author as $a return $a;} </result> …

… <book> <title>Datenbanksysteme</title> <author>Kemper</author> <author>Eickler</author> <price>40€</price> </book> …

Example: Buffer:

Output: <result><title>Datenbanksysteme</title><author>Kemper</author><author>Eickler</author></result>

No buffering usingorder constraints!

FluX – Intl. Conf. on Very Large Databases 2004 6

Outline Motivation

FluX Query Language Translating XQuery into FluX Further Aspects

Experiments

Conclusion

FluX – Intl. Conf. on Very Large Databases 2004 7

FluX Query Language

Based on XQuery fragment XQuery-

ε (empty) s (output

fixed string) α β (sequence) {for $x in $y/π [where χ] return α} (for loop) {$x/π} (output

path) {$x} (output) {if χ then α}

(conditional)

FluX – Intl. Conf. on Very Large Databases 2004 8

FluX Query Language

XQuery- expression is simple Can be executed without buffering the stream

Example 1: <a> {$x} </a>{if $x/b = 5 then <b>5</b>}

simple

{$x}{$x}

Example 2: not simple

FluX – Intl. Conf. on Very Large Databases 2004 9

FluX Query Language (ctd.) FluX expressions

Simple XQuery- expression s {process-stream $y: H } s´

Event handlers H on-first past(S) return α

α: XQuery- expression S: set of symbols

on a as $x return Q a: symbol name $x: variable Q: FluX expression

α executed on buffers

Q executed in event-based fashion

FluX – Intl. Conf. on Very Large Databases 2004 10

Safe FluX Queries

FluX query is safe No XQuery- expression refers to elements that may still be encountered in the stream

Bibliography DTD<!ELEMENT bib (book)*><!ELEMENT book ((title|author)*, price)>

FluX query… <result> {process-stream $b: on title as $t return $t; on-first past (title,author) return {for $p in $b/price return $p}} </result> …

Data stream… <book> <author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> <price>39€</price> </book> …

execute

Not safe!

FluX – Intl. Conf. on Very Large Databases 2004 11

Safe FluX Queries

FluX query is safe No XQuery- expression refers to elements that may still be encountered in the stream

Bibliography DTD<!ELEMENT bib (book)*><!ELEMENT book ((title|author)*, price)>

FluX query… <result> {process-stream $b: on title as $t return $t; on-first past (title,author, price) return {for $p in $b/price return $p}} </result> …

Data stream… <book> <author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> <price>39€</price> </book> …

execu

te

Safe!

FluX – Intl. Conf. on Very Large Databases 2004 12

Outline Motivation

FluX Query Language Translating XQuery into FluX Further Aspects

Experiments

Conclusion

FluX – Intl. Conf. on Very Large Databases 2004 13

XQuery to FluX Rewrite XQuery- Q to FluX query F using

(non-recursive) DTD F is safe w.r.t. DTD F is equivalent to Q F has low memory consumption Appropriate scheduling of event processors

Steps1. Normalization of Q2. Rewriting into FluX

FluX – Intl. Conf. on Very Large Databases 2004 14

Normalization Rule-based rewriting of XQuery

Split paths in single step for loops Eliminate where using if Push down if expressions Rewrite paths $x/a/… to for loops

XMP, Q1

<bib> {for $b in $ROOT/bib/book where χ return <book> {$b/year} {$b/title} </book>}</bib>

<bib> {for $bib in $ROOT/bib return {for $b in $bib/book return {if χ then <book>} {for $year in $b/year return {if χ then {$year}}} {for $title in $b/title return {if χ then {$title}}} {if χ then </book>}}}</bib>

FluX – Intl. Conf. on Very Large Databases 2004 15

Example

<results>{for $bib in $ROOT/bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result>}}</results>

function rewrite(Variable parentVar, Set<Σ> H, XQuery- β): FluX

rewrite($ROOT, {}, Q)Delay execution of β

Bibliography DTD<!ELEMENT bib (book)*><!ELEMENT book ((title|author)*,price)>

FluX – Intl. Conf. on Very Large Databases 2004 16

Example

<results>{for $bib in $ROOT/bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result>}}</results>

rewrite($ROOT, {}, β1) β1 simple, no delaygenerate on-first past () return …

β1

β2

FluX – Intl. Conf. on Very Large Databases 2004 17

Example

{ps $ROOT: on-first past() return <results> {for $bib in $ROOT/bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result>}} </results>

rewrite($ROOT, {}, β2)

β2

FluX – Intl. Conf. on Very Large Databases 2004 18

Example

{ps $ROOT: on-first past() return <results>{for $bib in $ROOT/bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result>}}</results>

rewrite($ROOT, {}, β2) β21, β22

rewrite($ROOT, {}, β21) no delay generate on bib as $bib return …

β21

β22

FluX – Intl. Conf. on Very Large Databases 2004 19

Example

{ps $ROOT: on-first past() return <results> on bib as $bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result> }}</results>

rewrite($bib, {}, α1) no delay generate on book as $b return …

α1

FluX – Intl. Conf. on Very Large Databases 2004 20

Example

{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result> }</results>

rewrite($b, {}, α2) as before, no delaysgenerate on-first past() return …

on title as $t return …

α2

FluX – Intl. Conf. on Very Large Databases 2004 21

Example

{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; {for $a in $b/author return {$a}} </result> }</results>

Assure all titles before α32 rewrite($b, {title}, α32)

rewrite($b, {title}, α41) delay execution after title, buffered executiongenerate on-first past(title,author) return …

α32

α41

α42

FluX – Intl. Conf. on Very Large Databases 2004 22

Example

{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; on-first past(title,author) return {for $a in $b/author return {$a}}; </result> }</results>

Assure all titles and authors before α42 rewrite($b, {title,authors}, α42) α42 simple, delay execution after title,author

generate on-first past(title,author) return …

α42

FluX – Intl. Conf. on Very Large Databases 2004 23

Example

{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; on-first past(title,author) return {for $a in $b/author return {$a}}; on-first past(title,author) return </result>;};</results>

FluX – Intl. Conf. on Very Large Databases 2004 24

Example

{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; on-first past(title,author) return {for $a in $b/author return {$a}}; on-first past(title,author) return </result>;} on-first past(bib) return </results>;}

FluX – Intl. Conf. on Very Large Databases 2004 25

Example – Order Constraints

{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; {for $a in $b/author return {$a}} </result> }</results>

Assure all titles before α41 rewrite($b, {title}, α41)

DTD ensures titles before authors generate on author as $a return …

α41

α42

<!ELEMENT bib (book)*><!ELEMENT book (title*, author*),…>

FluX – Intl. Conf. on Very Large Databases 2004 26

Example

{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; on author as $a return {$a}; on-first past(title,author) return </result>;}; on-first past(bib) return </results>;}

Assure all titles before α41 rewrite($b, {title}, α41) H={title}

DTD ensures titles before authors generate on author as $a return …

<!ELEMENT bib (book)*><!ELEMENT book (title*, author*)>

FluX – Intl. Conf. on Very Large Databases 2004 27

Outline Motivation

FluX Query Language Translating XQuery into FluX Further Aspects

Experiments

Conclusion

FluX – Intl. Conf. on Very Large Databases 2004 28

Further Aspects

Visit our demonstration (Group 3: XML)

To Normal Form

Algebraic Optimizations

To FluX

XQuery

DTD

Query Compiler

Streamed Query Evaluator

XSAX

Memory Buffers

Query Optimizer

Runtime Engine

XML Input Stream XML Output Stream

FluX – Intl. Conf. on Very Large Databases 2004 29

Outline Motivation

FluX Query Language Translating XQuery into FluX Further Aspects

Experiments

Conclusion

FluX – Intl. Conf. on Very Large Databases 2004 30

Experiments Based on XMark Queries adapted to XQuery- fragment

Environment AMD Athlon XP 2000, 512MB RAM Linux, Sun JDK 1.4.2_03

Measurements Execution time Memory consumption

FluX – Intl. Conf. on Very Large Databases 2004 31

Experiments    FluX Galax AnonX    time [s] memory time [s] memory time [s]

  5M 2,1 0 13,4 37M 3,4Q1 10M 2,8 0 29,8 83M 6,7

50M 7,8 0 - >500M 38,3  100M 14,0 0 - >500M -  5M 6,8 1,54M 296,9 50M 143,8

Q8 10M 17,2 3,16M 1498,3 100M 534,8  50M 357,8 16,00M - >500M -  100M 11566,9 32,25M - >500M -

5M 5,6 374k 277,0 50M n/aQ11 10M 11,4 741k 1663,7 100M n/a

50M 170,8 3,64M - >500M n/a  100M 626,8 7,27M - >500M n/a  5M 2,2 0 12,8 38M 3,0

Q13 10M 3,1 0 27,2 73M 5,2  50M 7,9 0 230,1 344M 88,0  100M 13,9 0 - >500M -

5M 2,8 4,66k 13,2 36M 2,5Q20 10M 3,4 5,18k 29,7 80M 6,2

50M 8,7 7,01k - >500M 151,9100M 15,4 7,02k - >500M -

FluX – Intl. Conf. on Very Large Databases 2004 32

Outline Motivation

FluX Query Language Translating XQuery into FluX Further Aspects

Experiments

Conclusion

FluX – Intl. Conf. on Very Large Databases 2004 33

Conclusion FluX

Event based extension of XQuery Rewriting of XQuery into FluX

Usage of information of DTD

FluX supports buffer-conscious query processing Low main memory consumption Efficient and scalable query execution on data streams

Future work Recursive DTDs Extension of XQuery- subset (e.g., //, aggregate operators) Improve execution (joins)

FluX – Intl. Conf. on Very Large Databases 2004 34

Related Work Altinel, Franklin. “Efficient Filtering of XML Documents for Selective

Dissemination of Information”. VLDB 2000 Buneman, Grohe, Koch. “Path Queries on Compressed XML”. VLDB 2003 Chan, Felber, Garofalakis, Rastogi. “Efficient Filtering of XML Documents

with XPath Expressions”. ICDE 2002 Deutsch, Tannen. “Reformulation of XML Queries and Constraints”. ICDT

2003 Fegaras, Levine, Bose, Chaluvadi. “Query Processing on Streamed XML

Data”. CIKM 2002 Green, Miklau, Onizuka, Suciu. “Processing XML Streams with

Deterministic Automata”. ICDT 2003 Gupta, Suciu. “Stream Processing of XPath Queries with Predicates”.

SIGMOD 2003 Ludäscher, Mukhopadhyay, Papakonstantinou. “A Transducer-Based XML

Query Processor”. VLDB 2002 Marian, Siméon. “Projecting XML Documents”. VLDB 2003 Olteanu, Kiesling, Bry. “An Evaluation of Regular Path Expressions with

Qualifiers against XML Streams”. ICDE 2003