niagaracq : a scalable continuous query system for internet databases (modified slides available on...

33
Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences Dept. University of Wisconsin- Madison SIGMOD 2000 Talk by Naresh Kumar

Upload: olivia-stone

Post on 11-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage)

Jianjun Chen et al Computer Sciences Dept. University of Wisconsin-Madison

SIGMOD 2000 Talk by Naresh Kumar

Page 2: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Outline Motivation

What is NiagaraCQ ?

General strategy of incremental group optimization

Query split scheme with materialized intermediate files

Incremental grouping of selection and join operators

Experimental Details

Page 3: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Data Stream Management System(DSMS)

The two fundamental difference btn DBMS and DSMS

In addition to managing traditional stored data, DSMS must handle multiple continuous, unbounded, possibly rapid data streams

A DSMS supports long running continuous queries.

Page 4: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Continuous Queries(CQ)

Continuous queries are persistent queries that allow user to receive results when they become available.Example

Notify me when ever price of Dell stock drops by more than 5%A broad classification

Change based Timer based

Page 5: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Motivation Continuous queries are growingly popular.

Notifies user with out querying repeatedly.

Much useful in internet where information changes frequently

Challenges: Need to be able to support million of

queries due to scale of internet.

Solution - NiagaraCQ

Page 6: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Previous Attempts Previous group optimization efforts focused on finding an optimal plan for a small number of similar queries Not applicable to a continuous query system for the following reasons: Computationally too expensive to

handle a large number of queries. Not designed for an environment like

the web where CQ s are added or removed dynamically.

Page 7: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

What is NiagaraCQ ? NiagaraCQ–A CQ system for the internet

It is built on the assumption that Many queries tend to be similar to one

another. Similar queries can be grouped

together

It supports scalable continuous query processing over multiple, distributed XML files

Page 8: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

NiagaraCQ - Approaches Advantages of grouping : Grouped queries can share

computation. They can reside in memory saving

IO-cost Avoid unnecessary invocations by

testing many CQ together Handles both change based and timer based queries in a uniform way To ensure scalability: Incremental evaluation of CQ's Memory caching

Page 9: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

NiagaraCQ command language

Creating a CQCreate CQ_name XML-QL queryDo action { START start_time} { EVERY

time_interval} { EXPIRE expiration_time}

Delete CQ_name

Page 10: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Expression SignatureQuery examples Where <Quotes><Quote><Symbol>INTC</></></> element_as $g in “http://www.stock.com/quotes.xml”

construct $g

Where <Quotes><Quote><Symbol>MSFT</></></> element_as $g in “http://www.stock.com/quotes.xml”

construct $g

Expression signatures

Quotes.Quote.Symbol in quotes.xml

constant

=

Page 11: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Query plans

Trigger Action I Trigger Action J

File Scan

Select Symbol = “MSFT”

Select Symbol = “INTC”

File Scan

quotes.xml quotes.xml

Page 12: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Group

Group – signature, constant table, planGroup Signature

Common signature of all queries in the group

Group constant table

Dest_buffer Constant_value

Dest. J MSFT

Dest. I INTC

Page 13: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

The group plan

Page 14: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Incremental Grouping Algo

When a new query is submittedIf the expression signature of the new query matches that of existing groups

Break the query plan into two partsRemove the lower partAdd the upper part onto the group

planelse create a new group

Page 15: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

New Query**AOL added to Constant Table

**new destination buffer allocated**Matching process continues until top

Page 16: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Query split scheme

Matching Process will continue on the remainder of query plan until top of plan is reached

Thus, each continuous query is split into several smaller queries such that inputs of each of these queries are monitored using the same techniques that are used for the inputs of user defined continuous queries.

Incremental group optimization is very efficient because it only requires one traversal of query

Page 17: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Query split with materialized intermediate files

Destination buffer for the split operator can be implemented in Pipelined scheme Intermediate file scheme

Disadvantages of pipelined scheme It does not work for grouping timer based queries Gives a single complicated execution plan Combined plan may be very large and require

resources beyond limit of system A large portion of query plan may not need to be

executed at each invocation Split operator may block simple queries

Page 18: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Query split with materialized intermediate files(cont...)

Using intermediate files Split operator writes each output stream to a file Cut query plan into 2 parts at split operator Add a file scan operator to upper part to read

intermediate file Intermediate files are monitored just like other

data sources Intermediate file names are stored in the constant

table Grouped CQ with same constant share same

intermediate file

Page 19: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

The query split scheme

Page 20: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Trade-offsOther advantages of materialized intermediate files

Only the necessary queries are executed. Thus computation time reduced

Tree structured query format – can easily scheduled and executed by general query engine

Uniform handling of intermediate files and original data source files

Bottle neck problem is avoided

Disadvantages Split operator becomes a blocking operator Extra disk I/Os

Page 21: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Range Predicates

E.g. R.a < val or val1 < R.a < val2Multiple such rangesProblem

Intermediate files may contain duplicate tuples

Idea: Virtual intermediate files Use an index to implement this

Page 22: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Incremental grouping of selection predicates

Multiple selection predicates in a query CNF for predicates on same data source Incremental grouping

Choose the most selective conjunct and implement virtual file on this conjunct

Example queryWhere <Quotes><Quote><Symbol>”INTC”</> <Current_Price>$p</></> element_as $g </> in “quotes.xml”, $p < 100Construct $g

Page 23: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Incremental grouping of join operators

Join operators are usually expensive, sharing common join operations can significantly reduce the amount of computation.A join queryQuotes.Quote.Change_Ratio constant in “quotes.xml”

Where <Quotes><Quote><Symbol>$s</></>element_as $g </> in “quotes.xml”,

<Companies><Company><Symbol>$s</></>element_as $t</> in “companies.xml”

construct $g, $t

Page 24: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Queries that contain both join and selection

Example query :Where <Quotes><Quote><Symbol>$s</>

<Industry>”Computer Service”</></>element_as $g </> in “quotes.xml”,

<Companies><Company><Symbol>$s</></>element_as $t</> in “companies.xml”

construct $g, $t

Where to place the selection operator ? Below the join

Eliminates irrelevant tuples Above the join

Allows sharing Pick based on cost model

Page 25: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Grouping timer-based queries

Challenge Hard to monitor the timer events of

queries Sharing common computation becomes

difficult

Event Detection Stores time events sorted in time order Each query has an ID

Page 26: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Incremental evaluation

Invoke queries only on changed data

For each source file, NiagaraCQ keeps a delta file Also for the intermediate files Time stamp store the each tuple – for timer

based queries

Incremental evaluation of join operators requires complete data files

Page 27: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Memory Caching

Thousands of continuous queries can’t fit in memoryWhat should we cache ?

Grouped query plans What about non-grouped queries ?

Favor small delta files Time window of the event list

Page 28: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

System Architecture

Page 29: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

CQ processing

Page 30: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Experimental Results

Example query :Where <Quotes><Quote><Symbol>”INTC”</></>element_as $g </> in “quotes.xml”, construct $g

N = number of installed queriesF= number of fired queriesC = number of tuples modified

Page 31: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Performance Results

Case 1: F=N, C=1000Case 2: F=100, C=1000

Page 32: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Performance ResultsF=N=2000, vary data size

Page 33: NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences

Thank You