niagaracq a scalable continuous query system for internet databases jianjun chen, david j dewitt,...

25
NiagaraCQ NiagaraCQ A Scalable Continuous Query A Scalable Continuous Query System System for Internet Databases for Internet Databases Jianjun Chen, David J DeWitt, Feng Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang Tian, Yuan Wang University of Wisconsin – Madison 2000 University of Wisconsin – Madison 2000 Slides adapted from Rachel Pottlinger and Slides adapted from Rachel Pottlinger and Yehoshua Sagiv Yehoshua Sagiv Presented by Andrea Connell Presented by Andrea Connell

Upload: elfreda-harris

Post on 11-Jan-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

NiagaraCQNiagaraCQ

A Scalable Continuous Query A Scalable Continuous Query SystemSystem

for Internet Databasesfor Internet Databases

Jianjun Chen, David J DeWitt, Feng Tian, Jianjun Chen, David J DeWitt, Feng Tian, Yuan WangYuan Wang

University of Wisconsin – Madison 2000University of Wisconsin – Madison 2000

Slides adapted from Rachel Pottlinger and Yehoshua Slides adapted from Rachel Pottlinger and Yehoshua SagivSagiv

Presented by Andrea ConnellPresented by Andrea Connell

Page 2: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

2NiagaraCQ

ProblemProblem

Lack of a Lack of a scalablescalable and and efficientefficient system which supports persistent system which supports persistent queries, that allow users to receive new queries, that allow users to receive new results when they become available:results when they become available:

Notify me whenever the price of Dell or Notify me whenever the price of Dell or Micron stock drops by more than 5% Micron stock drops by more than 5% and the price of Intel stock remains and the price of Intel stock remains unchanged over next three months.unchanged over next three months.

The internet has a large amount of The internet has a large amount of frequently updating data – how do we frequently updating data – how do we manage CQs efficientlymanage CQs efficiently

Page 3: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

3NiagaraCQ

ApproachApproach

Incremental Grouping by similar query Incremental Grouping by similar query structure structure

Grouped CQs share computation and dataGrouped CQs share computation and data

Reduce I/OReduce I/O

Reduce unnecessary query invocationsReduce unnecessary query invocations

Change-based or timer-based queriesChange-based or timer-based queries

Incremental Evaluation Incremental Evaluation

User Interface - high level query User Interface - high level query languagelanguage

Page 4: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

4NiagaraCQ

Command LanguageCommand Language

Create continuous query:Create continuous query:CREATECREATE CQ_nameCQ_nameXML-QLXML-QL queryqueryDODO actionaction{{STARTSTART start_timestart_time} {} {EVERYEVERY

time_intervaltime_interval}}{{EXPIREEXPIRE expiration_timeexpiration_time}}

Delete continuous query:Delete continuous query:DELETEDELETE CQ_nameCQ_name

Page 5: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

5NiagaraCQ

Expression SignatureExpression Signature

Represent the same syntax structure, Represent the same syntax structure, but possibly different constant values, but possibly different constant values, in different queries.in different queries.

Where <Quotes> <Quote>Where <Quotes> <Quote><Symbol><Symbol>INTCINTC</></>

</> </> element_as $g</> </> element_as $gin “http://www.cs.wisc.edu/db/quotes.xml”in “http://www.cs.wisc.edu/db/quotes.xml”construct $gconstruct $g =

Quotes.Quote.Symbol constantin quotes.xml

Where <Quotes> <Quote>Where <Quotes> <Quote><Symbol><Symbol>MSFTMSFT</></>

</> </> element_as $g</> </> element_as $gin “http://www.cs.wisc.edu/db/quotes.xml”in “http://www.cs.wisc.edu/db/quotes.xml”construct $gconstruct $g

Page 6: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

6NiagaraCQ

Query PlanQuery PlanTrigger Action I

Select Symbol=“INTC”

File Scan

quotes.xml

Trigger Action J

Select Symbol=“MSFT”

File Scan

quotes.xml

Page 7: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

7NiagaraCQ

GroupsGroupsGroups are created for queries based Groups are created for queries based on their expression signatures. on their expression signatures. Consists of three parts:Consists of three parts:Group Signature Group Signature Group Constant tableGroup Constant tableGroup Query PlanGroup Query Plan

Page 8: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

8NiagaraCQ

GroupsGroupsGroups are created for queries based Groups are created for queries based on their expression signatures. on their expression signatures. Consists of three parts:Consists of three parts:

=

Quotes.Quote.Symbol constantin quotes.xml

Group SignatureGroup SignatureGroup Constant tableGroup Constant tableGroup Query PlanGroup Query Plan

Page 9: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

9NiagaraCQ

GroupsGroupsGroups are created for queries based Groups are created for queries based on their expression signatures. on their expression signatures. Consists of three parts:Consists of three parts:Group SignatureGroup SignatureGroup Constant Group Constant tabletableGroup Query PlanGroup Query Plan

Constant_valueConstant_value Destination_buffDestination_bufferer

…… ……

INTCINTC Dest . IDest . I

MSFTMSFT Dest . JDest . J

…… ……

Stored on disk

Page 10: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

10NiagaraCQ

GroupsGroupsGroups are created for queries based Groups are created for queries based on their expression signatures. on their expression signatures. Consists of three parts:Consists of three parts:Group SignatureGroup SignatureGroup Constant tableGroup Constant tableGroup Query PlanGroup Query Plan

FileFile Scan

quotes.xml Constant Table

Symbol = Constant_value

Join

Split

Action I Action J.....

Stored in memory-resident hash table

Page 11: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

11NiagaraCQ

Incremental Grouping Incremental Grouping AlgorithmAlgorithm

1.1. Group optimizer Group optimizer traverses the query plan traverses the query plan bottom up.bottom up.

2.2. Matches the query’s Matches the query’s expression signature with expression signature with the signatures of existing the signatures of existing groupsgroups

3.3. Group optimizer breaks Group optimizer breaks the query plan into two the query plan into two partspartsLower Lower –– removed removedUpper Upper –– added to the added to the group plan.group plan.

4.4. Adds the constant and Adds the constant and action to the constant action to the constant table.table.

Trigger Action

Select Symbol=“AOL”

File Scan

quotes.xml

Groups may not be optimal

Page 12: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

ExampleExample

12NiagaraCQ

Using the constant table, the split function moves all values for MS to buffer A and SUN to buffer B

What are these buffers? How do they work?

Page 13: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

13NiagaraCQ

Pipeline ApproachPipeline ApproachTuples are pipelined directly from the Tuples are pipelined directly from the output of one operator into the input of output of one operator into the input of the next operator. All parts of the the next operator. All parts of the group are combined (including trigger group are combined (including trigger actions), creating a single execution actions), creating a single execution plan.plan.

DisadvantagesDisadvantages

Doesn’t work for grouping timer-Doesn’t work for grouping timer-based queries.based queries.

Split operator may become a Split operator may become a bottleneck.bottleneck.

Not all trigger actions may need to be Not all trigger actions may need to be executed.executed.

Page 14: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

14NiagaraCQ

Intermediate FilesIntermediate Files

Figure 3.8

Page 15: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

15NiagaraCQ

Intermediate FilesIntermediate Files

AdvantagesAdvantages Each query is scheduled independentlyEach query is scheduled independently Intermediate files and original data sources Intermediate files and original data sources

are monitored in the same wayare monitored in the same way The potential bottleneck problem of the The potential bottleneck problem of the

pipelined approach is avoided.pipelined approach is avoided.

DisadvantagesDisadvantages

Extra disk I/Os.Extra disk I/Os.

Split operator becomes a blocking Split operator becomes a blocking operator.operator.

Page 16: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

16NiagaraCQ

Range QueriesRange QueriesWhat if we want to return every stock with a price increase of What if we want to return every stock with a price increase of more than 5%? more than 5%?

A range query may have an upper bound and a lower bound, so A range query may have an upper bound and a lower bound, so the constant table is modified to include these two columns.the constant table is modified to include these two columns.

Where <Quotes> <Quote>Where <Quotes> <Quote><Change_ratio>$c</><Change_ratio>$c</>

</> </> element_as $g</> </> element_as $gin “quotes.xml”, $c>in “quotes.xml”, $c>0.050.05construct $gconstruct $g

Where <Quotes> <Quote>Where <Quotes> <Quote><Change_ratio>$c</><Change_ratio>$c</>

</> </> element_as $g</> </> element_as $gin “quotes.xml”, $c>in “quotes.xml”, $c>0.150.15construct $gconstruct $g

Overlap in intermediate files

Page 17: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

17NiagaraCQ

VirtualVirtual Intermediate Intermediate FilesFiles

All outputs from split operator are stored All outputs from split operator are stored in one real intermediate file.in one real intermediate file.

This file has clustered index on the range.This file has clustered index on the range.

Virtual intermediate files store a value Virtual intermediate files store a value range.range.

The value range is used to retrieve data The value range is used to retrieve data from the real intermediate file.from the real intermediate file.

Modification of virtual intermediate files Modification of virtual intermediate files can trigger upper-level queries.can trigger upper-level queries.

Page 18: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

Grouping of Join Grouping of Join OperatorsOperators

Since joins can be very expensive, joins Since joins can be very expensive, joins with the same expression are groupedwith the same expression are grouped..

Which order: Join first, or Selection firstWhich order: Join first, or Selection first??

18NiagaraCQ

This paper says Selection; Future work says join This paper says Selection; Future work says join

Page 19: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

19NiagaraCQ

Event DetectionEvent Detection

Types of EventsTypes of Events

Data-source changeData-source changePush-based (inform NiagaraCQ of changes)Push-based (inform NiagaraCQ of changes)

Pull-based (checked periodically by Pull-based (checked periodically by NiagaraCQ)NiagaraCQ)

TimerTimerSet to a specific time intervalSet to a specific time interval

Grouped with other timer-based queriesGrouped with other timer-based queries

Only fired if data has changedOnly fired if data has changed

Page 20: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

20NiagaraCQ

Incremental EvaluationIncremental Evaluation

Queries are invoked only on changed dataQueries are invoked only on changed data

For each file, NiagaraCQ keeps a For each file, NiagaraCQ keeps a ““delta filedelta file””

Queries are run over delta files when possibleQueries are run over delta files when possible

Incremental evaluation of join operators Incremental evaluation of join operators requires complete data filesrequires complete data files

Time stamp is added to each tuple in the delta Time stamp is added to each tuple in the delta file in order to support timer-based queriesfile in order to support timer-based queries

Tuples remain in delta file for the longest time Tuples remain in delta file for the longest time interval within the groupinterval within the group

Page 21: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

21NiagaraCQ

System ArchitectureSystem Architecture

Figure 4.1

Page 22: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

22NiagaraCQ

Continues Queries Continues Queries ProcessingProcessing

Continuous Query Manager

(CQM)

Event Detector (ED)

Data Manager (DM)

Query Engine (QE)

1

2 , 3

4

5

6

7

8

1. CQM adds continuous queries with file and timer information to enable ED to monitor the events

2. ED asks DM to monitor changes to files3. When a timer event happens, ED asks DM the last modified time of files

4. DM informs ED of changes to pushed-based data sources

5. If file changes and timer events are satisfied, ED provides CQM with a list of firing CQs

6. CQM invokes QE to execute firing CQs7. File scan operator calls DM to retrieve selected documents

8. DM only returns changes between last fire time and current fire time

Figure 4.2

NiagaraCQNiagara

Page 23: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

Experimental Results

23NiagaraCQ

Simple Selection

Range Selection

Selection & Join

Equal & Range

Mixed Queries

Page 24: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

24NiagaraCQ

ReferencesReferencesNiagaraCQ: A Scalable Continuous Query NiagaraCQ: A Scalable Continuous Query System for Internet DatabasesSystem for Internet Databaseshttp://www.cs.wisc.edu/niagara/papers/NiagaraCQ.pdfhttp://www.cs.wisc.edu/niagara/papers/NiagaraCQ.pdf

Design and Evaluation of Alternative Selection Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Placement Strategies in Optimizing Continuous QueriesContinuous Querieshttp://www.cs.wisc.edu/niagara/papers/Icde02.pdfhttp://www.cs.wisc.edu/niagara/papers/Icde02.pdf

  Dynamic Re-grouping of Continuous QueriesDynamic Re-grouping of Continuous Querieshttp://www.cs.wisc.edu/niagara/papers/507.pdfhttp://www.cs.wisc.edu/niagara/papers/507.pdf

Follow Up PapersFollow Up Papers

Page 25: NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison

What kinds of applications other than stock quotes What kinds of applications other than stock quotes would this be appropriate for? What would it not would this be appropriate for? What would it not work forwork for??

NiagaraCQ is somewhat similar to RSS. What types of NiagaraCQ is somewhat similar to RSS. What types of applications are better with RSS and which are applications are better with RSS and which are

better with NiagaraCQbetter with NiagaraCQ ? ?

Are expression signatures too simple? Do they group Are expression signatures too simple? Do they group together enough of the kinds of queries that this together enough of the kinds of queries that this system is meant to handle? Do you think they system is meant to handle? Do you think they would work better or worse for SQL queries instead would work better or worse for SQL queries instead of XMLof XML??

25NiagaraCQ

DiscussionDiscussion