niagaracq a scalable continuous query system for internet databases jianjun chen, david j dewitt,...
TRANSCRIPT
NiagaraCQNiagaraCQ
A Scalable Continuous Query A Scalable Continuous Query SystemSystem
for Internet Databasesfor Internet Databases
Jianjun Chen, David J DeWitt, Feng Tian, Jianjun Chen, David J DeWitt, Feng Tian, Yuan WangYuan Wang
University of Wisconsin – Madison 2000University of Wisconsin – Madison 2000
Slides adapted from Rachel Pottlinger and Yehoshua Slides adapted from Rachel Pottlinger and Yehoshua SagivSagiv
Presented by Andrea ConnellPresented by Andrea Connell
2NiagaraCQ
ProblemProblem
Lack of a Lack of a scalablescalable and and efficientefficient system which supports persistent system which supports persistent queries, that allow users to receive new queries, that allow users to receive new results when they become available:results when they become available:
Notify me whenever the price of Dell or Notify me whenever the price of Dell or Micron stock drops by more than 5% Micron stock drops by more than 5% and the price of Intel stock remains and the price of Intel stock remains unchanged over next three months.unchanged over next three months.
The internet has a large amount of The internet has a large amount of frequently updating data – how do we frequently updating data – how do we manage CQs efficientlymanage CQs efficiently
3NiagaraCQ
ApproachApproach
Incremental Grouping by similar query Incremental Grouping by similar query structure structure
Grouped CQs share computation and dataGrouped CQs share computation and data
Reduce I/OReduce I/O
Reduce unnecessary query invocationsReduce unnecessary query invocations
Change-based or timer-based queriesChange-based or timer-based queries
Incremental Evaluation Incremental Evaluation
User Interface - high level query User Interface - high level query languagelanguage
4NiagaraCQ
Command LanguageCommand Language
Create continuous query:Create continuous query:CREATECREATE CQ_nameCQ_nameXML-QLXML-QL queryqueryDODO actionaction{{STARTSTART start_timestart_time} {} {EVERYEVERY
time_intervaltime_interval}}{{EXPIREEXPIRE expiration_timeexpiration_time}}
Delete continuous query:Delete continuous query:DELETEDELETE CQ_nameCQ_name
5NiagaraCQ
Expression SignatureExpression Signature
Represent the same syntax structure, Represent the same syntax structure, but possibly different constant values, but possibly different constant values, in different queries.in different queries.
Where <Quotes> <Quote>Where <Quotes> <Quote><Symbol><Symbol>INTCINTC</></>
</> </> element_as $g</> </> element_as $gin “http://www.cs.wisc.edu/db/quotes.xml”in “http://www.cs.wisc.edu/db/quotes.xml”construct $gconstruct $g =
Quotes.Quote.Symbol constantin quotes.xml
Where <Quotes> <Quote>Where <Quotes> <Quote><Symbol><Symbol>MSFTMSFT</></>
</> </> element_as $g</> </> element_as $gin “http://www.cs.wisc.edu/db/quotes.xml”in “http://www.cs.wisc.edu/db/quotes.xml”construct $gconstruct $g
6NiagaraCQ
Query PlanQuery PlanTrigger Action I
Select Symbol=“INTC”
File Scan
quotes.xml
Trigger Action J
Select Symbol=“MSFT”
File Scan
quotes.xml
7NiagaraCQ
GroupsGroupsGroups are created for queries based Groups are created for queries based on their expression signatures. on their expression signatures. Consists of three parts:Consists of three parts:Group Signature Group Signature Group Constant tableGroup Constant tableGroup Query PlanGroup Query Plan
8NiagaraCQ
GroupsGroupsGroups are created for queries based Groups are created for queries based on their expression signatures. on their expression signatures. Consists of three parts:Consists of three parts:
=
Quotes.Quote.Symbol constantin quotes.xml
Group SignatureGroup SignatureGroup Constant tableGroup Constant tableGroup Query PlanGroup Query Plan
9NiagaraCQ
GroupsGroupsGroups are created for queries based Groups are created for queries based on their expression signatures. on their expression signatures. Consists of three parts:Consists of three parts:Group SignatureGroup SignatureGroup Constant Group Constant tabletableGroup Query PlanGroup Query Plan
Constant_valueConstant_value Destination_buffDestination_bufferer
…… ……
INTCINTC Dest . IDest . I
MSFTMSFT Dest . JDest . J
…… ……
Stored on disk
10NiagaraCQ
GroupsGroupsGroups are created for queries based Groups are created for queries based on their expression signatures. on their expression signatures. Consists of three parts:Consists of three parts:Group SignatureGroup SignatureGroup Constant tableGroup Constant tableGroup Query PlanGroup Query Plan
FileFile Scan
quotes.xml Constant Table
Symbol = Constant_value
Join
Split
Action I Action J.....
Stored in memory-resident hash table
11NiagaraCQ
Incremental Grouping Incremental Grouping AlgorithmAlgorithm
1.1. Group optimizer Group optimizer traverses the query plan traverses the query plan bottom up.bottom up.
2.2. Matches the query’s Matches the query’s expression signature with expression signature with the signatures of existing the signatures of existing groupsgroups
3.3. Group optimizer breaks Group optimizer breaks the query plan into two the query plan into two partspartsLower Lower –– removed removedUpper Upper –– added to the added to the group plan.group plan.
4.4. Adds the constant and Adds the constant and action to the constant action to the constant table.table.
Trigger Action
Select Symbol=“AOL”
File Scan
quotes.xml
Groups may not be optimal
ExampleExample
12NiagaraCQ
Using the constant table, the split function moves all values for MS to buffer A and SUN to buffer B
What are these buffers? How do they work?
13NiagaraCQ
Pipeline ApproachPipeline ApproachTuples are pipelined directly from the Tuples are pipelined directly from the output of one operator into the input of output of one operator into the input of the next operator. All parts of the the next operator. All parts of the group are combined (including trigger group are combined (including trigger actions), creating a single execution actions), creating a single execution plan.plan.
DisadvantagesDisadvantages
Doesn’t work for grouping timer-Doesn’t work for grouping timer-based queries.based queries.
Split operator may become a Split operator may become a bottleneck.bottleneck.
Not all trigger actions may need to be Not all trigger actions may need to be executed.executed.
14NiagaraCQ
Intermediate FilesIntermediate Files
Figure 3.8
15NiagaraCQ
Intermediate FilesIntermediate Files
AdvantagesAdvantages Each query is scheduled independentlyEach query is scheduled independently Intermediate files and original data sources Intermediate files and original data sources
are monitored in the same wayare monitored in the same way The potential bottleneck problem of the The potential bottleneck problem of the
pipelined approach is avoided.pipelined approach is avoided.
DisadvantagesDisadvantages
Extra disk I/Os.Extra disk I/Os.
Split operator becomes a blocking Split operator becomes a blocking operator.operator.
16NiagaraCQ
Range QueriesRange QueriesWhat if we want to return every stock with a price increase of What if we want to return every stock with a price increase of more than 5%? more than 5%?
A range query may have an upper bound and a lower bound, so A range query may have an upper bound and a lower bound, so the constant table is modified to include these two columns.the constant table is modified to include these two columns.
Where <Quotes> <Quote>Where <Quotes> <Quote><Change_ratio>$c</><Change_ratio>$c</>
</> </> element_as $g</> </> element_as $gin “quotes.xml”, $c>in “quotes.xml”, $c>0.050.05construct $gconstruct $g
Where <Quotes> <Quote>Where <Quotes> <Quote><Change_ratio>$c</><Change_ratio>$c</>
</> </> element_as $g</> </> element_as $gin “quotes.xml”, $c>in “quotes.xml”, $c>0.150.15construct $gconstruct $g
Overlap in intermediate files
17NiagaraCQ
VirtualVirtual Intermediate Intermediate FilesFiles
All outputs from split operator are stored All outputs from split operator are stored in one real intermediate file.in one real intermediate file.
This file has clustered index on the range.This file has clustered index on the range.
Virtual intermediate files store a value Virtual intermediate files store a value range.range.
The value range is used to retrieve data The value range is used to retrieve data from the real intermediate file.from the real intermediate file.
Modification of virtual intermediate files Modification of virtual intermediate files can trigger upper-level queries.can trigger upper-level queries.
Grouping of Join Grouping of Join OperatorsOperators
Since joins can be very expensive, joins Since joins can be very expensive, joins with the same expression are groupedwith the same expression are grouped..
Which order: Join first, or Selection firstWhich order: Join first, or Selection first??
18NiagaraCQ
This paper says Selection; Future work says join This paper says Selection; Future work says join
19NiagaraCQ
Event DetectionEvent Detection
Types of EventsTypes of Events
Data-source changeData-source changePush-based (inform NiagaraCQ of changes)Push-based (inform NiagaraCQ of changes)
Pull-based (checked periodically by Pull-based (checked periodically by NiagaraCQ)NiagaraCQ)
TimerTimerSet to a specific time intervalSet to a specific time interval
Grouped with other timer-based queriesGrouped with other timer-based queries
Only fired if data has changedOnly fired if data has changed
20NiagaraCQ
Incremental EvaluationIncremental Evaluation
Queries are invoked only on changed dataQueries are invoked only on changed data
For each file, NiagaraCQ keeps a For each file, NiagaraCQ keeps a ““delta filedelta file””
Queries are run over delta files when possibleQueries are run over delta files when possible
Incremental evaluation of join operators Incremental evaluation of join operators requires complete data filesrequires complete data files
Time stamp is added to each tuple in the delta Time stamp is added to each tuple in the delta file in order to support timer-based queriesfile in order to support timer-based queries
Tuples remain in delta file for the longest time Tuples remain in delta file for the longest time interval within the groupinterval within the group
21NiagaraCQ
System ArchitectureSystem Architecture
Figure 4.1
22NiagaraCQ
Continues Queries Continues Queries ProcessingProcessing
Continuous Query Manager
(CQM)
Event Detector (ED)
Data Manager (DM)
Query Engine (QE)
1
2 , 3
4
5
6
7
8
1. CQM adds continuous queries with file and timer information to enable ED to monitor the events
2. ED asks DM to monitor changes to files3. When a timer event happens, ED asks DM the last modified time of files
4. DM informs ED of changes to pushed-based data sources
5. If file changes and timer events are satisfied, ED provides CQM with a list of firing CQs
6. CQM invokes QE to execute firing CQs7. File scan operator calls DM to retrieve selected documents
8. DM only returns changes between last fire time and current fire time
Figure 4.2
NiagaraCQNiagara
Experimental Results
23NiagaraCQ
Simple Selection
Range Selection
Selection & Join
Equal & Range
Mixed Queries
24NiagaraCQ
ReferencesReferencesNiagaraCQ: A Scalable Continuous Query NiagaraCQ: A Scalable Continuous Query System for Internet DatabasesSystem for Internet Databaseshttp://www.cs.wisc.edu/niagara/papers/NiagaraCQ.pdfhttp://www.cs.wisc.edu/niagara/papers/NiagaraCQ.pdf
Design and Evaluation of Alternative Selection Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Placement Strategies in Optimizing Continuous QueriesContinuous Querieshttp://www.cs.wisc.edu/niagara/papers/Icde02.pdfhttp://www.cs.wisc.edu/niagara/papers/Icde02.pdf
Dynamic Re-grouping of Continuous QueriesDynamic Re-grouping of Continuous Querieshttp://www.cs.wisc.edu/niagara/papers/507.pdfhttp://www.cs.wisc.edu/niagara/papers/507.pdf
Follow Up PapersFollow Up Papers
What kinds of applications other than stock quotes What kinds of applications other than stock quotes would this be appropriate for? What would it not would this be appropriate for? What would it not work forwork for??
NiagaraCQ is somewhat similar to RSS. What types of NiagaraCQ is somewhat similar to RSS. What types of applications are better with RSS and which are applications are better with RSS and which are
better with NiagaraCQbetter with NiagaraCQ ? ?
Are expression signatures too simple? Do they group Are expression signatures too simple? Do they group together enough of the kinds of queries that this together enough of the kinds of queries that this system is meant to handle? Do you think they system is meant to handle? Do you think they would work better or worse for SQL queries instead would work better or worse for SQL queries instead of XMLof XML??
25NiagaraCQ
DiscussionDiscussion