schema-based scheduling of event processors and buffer minimization for queries on structured data...
TRANSCRIPT
Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams
Bernhard Stegmaier (TU München)
Joint work with Christoph Koch (TU Wien) Stefanie Scherzinger (TU Wien) Nicole Schweikardt (HU Berlin)
FluX – Intl. Conf. on Very Large Databases 2004 2
Outline Motivation
FluX Query Language Translating XQuery into FluX Further Aspects
Experiments
Conclusion
FluX – Intl. Conf. on Very Large Databases 2004 3
Traditional ApproachBibliography DTD<!ELEMENT bib (book)*><!ELEMENT book ((title|author)*,price)>
List title(s) and authors of books<results> {for $b in /bib/book return <result> {$b/title} {$b/author} </result>}</results>
Evaluation of book-node1. Print <result>2. Buffer titles and authors3. Output titles4. Output authors5. Print </result>
… <book> <author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> <price>40€</price> </book> …
Example: Buffer: <author>Kemper</author><title>Datenbanksysteme</title><author>Eickler</author>
Output: <result><title>Datenbanksysteme</title><author>Kemper</author><author>Eickler</author></result>
FluX – Intl. Conf. on Very Large Databases 2004 4
The FluX ApproachBibliography DTD<!ELEMENT bib (book)*><!ELEMENT book ((title|author)*,price)>
List title(s) and authors of books<results> {for $b in /bib/book return <result> {$b/title} {$b/author} </result>}</results>
FluX query (for book node)… <result> {process-stream $b: on title as $t return $t; on-first past (title,author) return {for $a in $b/author return $a}} </result> …
… <book> <author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> <price>40€</price> </book> …
Example: Buffer: <author>Kemper</author><author>Eickler</author>
Output: <result><title>Datenbanksysteme</title><author>Kemper</author><author>Eickler</author></result>
Less buffering usingorder constraints
FluX – Intl. Conf. on Very Large Databases 2004 5
The FluX Approach IIBibliography DTD<!ELEMENT bib (book)*><!ELEMENT book ((title*,author*),price)>
List title(s) and authors of books<results> {for $b in /bib/book return <result> {$b/title} {$b/author} </result>}</results>
FluX query… <result> {process-stream $b: on title as $t return $t; on author as $a return $a;} </result> …
… <book> <title>Datenbanksysteme</title> <author>Kemper</author> <author>Eickler</author> <price>40€</price> </book> …
Example: Buffer:
Output: <result><title>Datenbanksysteme</title><author>Kemper</author><author>Eickler</author></result>
No buffering usingorder constraints!
FluX – Intl. Conf. on Very Large Databases 2004 6
Outline Motivation
FluX Query Language Translating XQuery into FluX Further Aspects
Experiments
Conclusion
FluX – Intl. Conf. on Very Large Databases 2004 7
FluX Query Language
Based on XQuery fragment XQuery-
ε (empty) s (output
fixed string) α β (sequence) {for $x in $y/π [where χ] return α} (for loop) {$x/π} (output
path) {$x} (output) {if χ then α}
(conditional)
FluX – Intl. Conf. on Very Large Databases 2004 8
FluX Query Language
XQuery- expression is simple Can be executed without buffering the stream
Example 1: <a> {$x} </a>{if $x/b = 5 then <b>5</b>}
simple
{$x}{$x}
Example 2: not simple
FluX – Intl. Conf. on Very Large Databases 2004 9
FluX Query Language (ctd.) FluX expressions
Simple XQuery- expression s {process-stream $y: H } s´
Event handlers H on-first past(S) return α
α: XQuery- expression S: set of symbols
on a as $x return Q a: symbol name $x: variable Q: FluX expression
α executed on buffers
Q executed in event-based fashion
FluX – Intl. Conf. on Very Large Databases 2004 10
Safe FluX Queries
FluX query is safe No XQuery- expression refers to elements that may still be encountered in the stream
Bibliography DTD<!ELEMENT bib (book)*><!ELEMENT book ((title|author)*, price)>
FluX query… <result> {process-stream $b: on title as $t return $t; on-first past (title,author) return {for $p in $b/price return $p}} </result> …
Data stream… <book> <author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> <price>39€</price> </book> …
execute
Not safe!
FluX – Intl. Conf. on Very Large Databases 2004 11
Safe FluX Queries
FluX query is safe No XQuery- expression refers to elements that may still be encountered in the stream
Bibliography DTD<!ELEMENT bib (book)*><!ELEMENT book ((title|author)*, price)>
FluX query… <result> {process-stream $b: on title as $t return $t; on-first past (title,author, price) return {for $p in $b/price return $p}} </result> …
Data stream… <book> <author>Kemper</author> <title>Datenbanksysteme</title> <author>Eickler</author> <price>39€</price> </book> …
execu
te
Safe!
FluX – Intl. Conf. on Very Large Databases 2004 12
Outline Motivation
FluX Query Language Translating XQuery into FluX Further Aspects
Experiments
Conclusion
FluX – Intl. Conf. on Very Large Databases 2004 13
XQuery to FluX Rewrite XQuery- Q to FluX query F using
(non-recursive) DTD F is safe w.r.t. DTD F is equivalent to Q F has low memory consumption Appropriate scheduling of event processors
Steps1. Normalization of Q2. Rewriting into FluX
FluX – Intl. Conf. on Very Large Databases 2004 14
Normalization Rule-based rewriting of XQuery
Split paths in single step for loops Eliminate where using if Push down if expressions Rewrite paths $x/a/… to for loops
XMP, Q1
<bib> {for $b in $ROOT/bib/book where χ return <book> {$b/year} {$b/title} </book>}</bib>
<bib> {for $bib in $ROOT/bib return {for $b in $bib/book return {if χ then <book>} {for $year in $b/year return {if χ then {$year}}} {for $title in $b/title return {if χ then {$title}}} {if χ then </book>}}}</bib>
FluX – Intl. Conf. on Very Large Databases 2004 15
Example
<results>{for $bib in $ROOT/bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result>}}</results>
function rewrite(Variable parentVar, Set<Σ> H, XQuery- β): FluX
rewrite($ROOT, {}, Q)Delay execution of β
Bibliography DTD<!ELEMENT bib (book)*><!ELEMENT book ((title|author)*,price)>
FluX – Intl. Conf. on Very Large Databases 2004 16
Example
<results>{for $bib in $ROOT/bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result>}}</results>
rewrite($ROOT, {}, β1) β1 simple, no delaygenerate on-first past () return …
β1
β2
FluX – Intl. Conf. on Very Large Databases 2004 17
Example
{ps $ROOT: on-first past() return <results> {for $bib in $ROOT/bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result>}} </results>
rewrite($ROOT, {}, β2)
β2
FluX – Intl. Conf. on Very Large Databases 2004 18
Example
{ps $ROOT: on-first past() return <results>{for $bib in $ROOT/bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result>}}</results>
rewrite($ROOT, {}, β2) β21, β22
rewrite($ROOT, {}, β21) no delay generate on bib as $bib return …
β21
β22
FluX – Intl. Conf. on Very Large Databases 2004 19
Example
{ps $ROOT: on-first past() return <results> on bib as $bib return {for $b in $bib/book return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result> }}</results>
rewrite($bib, {}, α1) no delay generate on book as $b return …
α1
FluX – Intl. Conf. on Very Large Databases 2004 20
Example
{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return <result> {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} </result> }</results>
rewrite($b, {}, α2) as before, no delaysgenerate on-first past() return …
on title as $t return …
α2
FluX – Intl. Conf. on Very Large Databases 2004 21
Example
{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; {for $a in $b/author return {$a}} </result> }</results>
Assure all titles before α32 rewrite($b, {title}, α32)
rewrite($b, {title}, α41) delay execution after title, buffered executiongenerate on-first past(title,author) return …
α32
α41
α42
FluX – Intl. Conf. on Very Large Databases 2004 22
Example
{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; on-first past(title,author) return {for $a in $b/author return {$a}}; </result> }</results>
Assure all titles and authors before α42 rewrite($b, {title,authors}, α42) α42 simple, delay execution after title,author
generate on-first past(title,author) return …
α42
FluX – Intl. Conf. on Very Large Databases 2004 23
Example
{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; on-first past(title,author) return {for $a in $b/author return {$a}}; on-first past(title,author) return </result>;};</results>
FluX – Intl. Conf. on Very Large Databases 2004 24
Example
{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; on-first past(title,author) return {for $a in $b/author return {$a}}; on-first past(title,author) return </result>;} on-first past(bib) return </results>;}
FluX – Intl. Conf. on Very Large Databases 2004 25
Example – Order Constraints
{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; {for $a in $b/author return {$a}} </result> }</results>
Assure all titles before α41 rewrite($b, {title}, α41)
DTD ensures titles before authors generate on author as $a return …
α41
α42
<!ELEMENT bib (book)*><!ELEMENT book (title*, author*),…>
FluX – Intl. Conf. on Very Large Databases 2004 26
Example
{ps $ROOT: on-first past() return <results> on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return <result>; on title as $t return {$t}; on author as $a return {$a}; on-first past(title,author) return </result>;}; on-first past(bib) return </results>;}
Assure all titles before α41 rewrite($b, {title}, α41) H={title}
DTD ensures titles before authors generate on author as $a return …
<!ELEMENT bib (book)*><!ELEMENT book (title*, author*)>
FluX – Intl. Conf. on Very Large Databases 2004 27
Outline Motivation
FluX Query Language Translating XQuery into FluX Further Aspects
Experiments
Conclusion
FluX – Intl. Conf. on Very Large Databases 2004 28
Further Aspects
Visit our demonstration (Group 3: XML)
To Normal Form
Algebraic Optimizations
To FluX
XQuery
DTD
Query Compiler
Streamed Query Evaluator
XSAX
Memory Buffers
Query Optimizer
Runtime Engine
XML Input Stream XML Output Stream
FluX – Intl. Conf. on Very Large Databases 2004 29
Outline Motivation
FluX Query Language Translating XQuery into FluX Further Aspects
Experiments
Conclusion
FluX – Intl. Conf. on Very Large Databases 2004 30
Experiments Based on XMark Queries adapted to XQuery- fragment
Environment AMD Athlon XP 2000, 512MB RAM Linux, Sun JDK 1.4.2_03
Measurements Execution time Memory consumption
FluX – Intl. Conf. on Very Large Databases 2004 31
Experiments FluX Galax AnonX time [s] memory time [s] memory time [s]
5M 2,1 0 13,4 37M 3,4Q1 10M 2,8 0 29,8 83M 6,7
50M 7,8 0 - >500M 38,3 100M 14,0 0 - >500M - 5M 6,8 1,54M 296,9 50M 143,8
Q8 10M 17,2 3,16M 1498,3 100M 534,8 50M 357,8 16,00M - >500M - 100M 11566,9 32,25M - >500M -
5M 5,6 374k 277,0 50M n/aQ11 10M 11,4 741k 1663,7 100M n/a
50M 170,8 3,64M - >500M n/a 100M 626,8 7,27M - >500M n/a 5M 2,2 0 12,8 38M 3,0
Q13 10M 3,1 0 27,2 73M 5,2 50M 7,9 0 230,1 344M 88,0 100M 13,9 0 - >500M -
5M 2,8 4,66k 13,2 36M 2,5Q20 10M 3,4 5,18k 29,7 80M 6,2
50M 8,7 7,01k - >500M 151,9100M 15,4 7,02k - >500M -
FluX – Intl. Conf. on Very Large Databases 2004 32
Outline Motivation
FluX Query Language Translating XQuery into FluX Further Aspects
Experiments
Conclusion
FluX – Intl. Conf. on Very Large Databases 2004 33
Conclusion FluX
Event based extension of XQuery Rewriting of XQuery into FluX
Usage of information of DTD
FluX supports buffer-conscious query processing Low main memory consumption Efficient and scalable query execution on data streams
Future work Recursive DTDs Extension of XQuery- subset (e.g., //, aggregate operators) Improve execution (joins)
FluX – Intl. Conf. on Very Large Databases 2004 34
Related Work Altinel, Franklin. “Efficient Filtering of XML Documents for Selective
Dissemination of Information”. VLDB 2000 Buneman, Grohe, Koch. “Path Queries on Compressed XML”. VLDB 2003 Chan, Felber, Garofalakis, Rastogi. “Efficient Filtering of XML Documents
with XPath Expressions”. ICDE 2002 Deutsch, Tannen. “Reformulation of XML Queries and Constraints”. ICDT
2003 Fegaras, Levine, Bose, Chaluvadi. “Query Processing on Streamed XML
Data”. CIKM 2002 Green, Miklau, Onizuka, Suciu. “Processing XML Streams with
Deterministic Automata”. ICDT 2003 Gupta, Suciu. “Stream Processing of XPath Queries with Predicates”.
SIGMOD 2003 Ludäscher, Mukhopadhyay, Papakonstantinou. “A Transducer-Based XML
Query Processor”. VLDB 2002 Marian, Siméon. “Projecting XML Documents”. VLDB 2003 Olteanu, Kiesling, Bry. “An Evaluation of Regular Path Expressions with
Qualifiers against XML Streams”. ICDE 2003