monitoring streams -- a new class of data management applications based on paper and talk by authors...
DESCRIPTION
Example Stream Applications Market Analysis –Streams of Stock Exchange Data Critical Care –Streams of Vital Sign Measurements Physical Plant Monitoring –Streams of Environmental Readings Biological Population Tracking –Streams of Positions from Individuals of a SpeciesTRANSCRIPT
![Page 1: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/1.jpg)
Monitoring Streams -- A New Class of Data Management Applications
based on paper and talk by authors below, slightly adapted for CS561:
Don Carney Brown UniversityUğur Çetintemel Brown UniversityMitch Cherniack Brandeis UniversityChristian Convey Brown UniversitySangdon Lee Brown UniversityGreg Seidman Brown UniversityMichael Stonebraker MITNesime Tatbul Brown UniversityStan Zdonik Brown University
![Page 2: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/2.jpg)
Background• Authors: MIT/Brown/Brandeis• First Aurora (idea), • Then Borealis (distributed,etc)• Then startup (StreamBase)
– Practical system– Designed for Scalablility: 106 stream inputs, queries– QoS-Driven Resource Management – Stream Storage Management – Reliability/ Fault Tolerance– Distribution and Adaptivity
![Page 3: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/3.jpg)
Example Stream Applications• Market Analysis
– Streams of Stock Exchange Data
• Critical Care– Streams of Vital Sign Measurements
• Physical Plant Monitoring– Streams of Environmental Readings
• Biological Population Tracking– Streams of Positions from Individuals of a Species
![Page 4: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/4.jpg)
Not Your Average DBMS
1. External, Autonomous Data Sources
2. Querying Time-Series
3. Triggers-in-the-large
4. Real-time response requirements
5. Noisy Data, Approximate Query Results
![Page 5: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/5.jpg)
Outline
2. Aurora Overview/ Query Model
3. Runtime Operation
4. Adaptivity
![Page 6: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/6.jpg)
Aurora from 100,000 Feet
Query App QoS...
...
Query App QoS...
Query App QoS
...
...
...
...
Each Provides:
• A over input data streams
• A Quality-Of-Service Specification ( )(specifies utility of partial or late results)
Application
Query
QoS
![Page 7: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/7.jpg)
Aurora from 100 Feet
App QoS...
...
App QoS...
App QoS
...
...
Queries = Workflow (Boxes and Arcs)• Workflow Diagram = “Aurora Network”
• Boxes = Query Operators
• Arcs = Streams
Slide
Tumble
Streams (Arcs)• stream: tuple sequence from common source
(e.g., sensor)
• tuples timestamped on arrival (Internal use: QoS)
Query Operators (Boxes)• Simple: FILTER, MAP, RESTREAM• Binary: UNION, JOIN, RESAMPLE• Windowed: TUMBLE, SLIDE, XSECTION, WSORT
![Page 8: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/8.jpg)
Aurora in Action
App QoS...
...
App QoS...
App QoS
...
...
Slide
Tumble
App
TumbleTumble App
“Box-at-a-time” Scheduling Arcs Tuple Queues
Outputs Monitored for QoS
![Page 9: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/9.jpg)
…
Continuous and Historical Queries
ad-hoc query
O4
O5
QoS
App…
O1 O3O2
continuous query
QoS
App… …Queues
O7O8 O9
view3 Days
QoS… …
ConnectionPoint
1 Hour
![Page 10: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/10.jpg)
Quality-of-Service (QoS)
Output Value
Specifies “Utility” Of Imperfect Query ResultsDelay-Based (specify utility of late results)Delivery-Based, Value-Based (specify utility of partial results)
QoS Influences… Scheduling, Storage Management, Load Shedding
% Tuples Delivered
B
Delay
A C
![Page 11: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/11.jpg)
Talk Outline
1. Introduction
2. Aurora Overview
3. Runtime Operation
4. Adaptivity
5. Related Work and Conclusions
![Page 12: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/12.jpg)
Runtime OperationBasic Architecture
Scheduler
QOSMonitor
Box Processors
...
Buffer
Storage Manager
Persistent Store
…q1…q2
…qi
…q1
…qn
...
…q2
...
.
.
.
Catalog
Router
inputs outputs
![Page 13: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/13.jpg)
Runtime OperationScheduling: Maximize Overall QoS
Choice 1: A: Cost: 1 sec(…, age: 1 sec)
B: Cost: 2 sec(…, age: 3 sec)
Delay = 2 secUtility = 0.5
Delay = 5 secUtility = 0.8
Schedule Box A now rather than later
Ideal: Maximize Overall Utility (feedback driven)
Choice 2:
![Page 14: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/14.jpg)
Runtime OperationScheduling: Minimizing Per Tuple Processing Overhead
Train Scheduling:
A B… xyz A (x)A (y)A (z) B (A (x))B (A (y))B (A (z))
Default Operation: = Context Switch
AB… xyz B (A (x))B (A (y))B (A (z))Box Trains:
A B… xyz A (z, y, x) B (A (z), A (y), A (x))Tuple Trains:
![Page 15: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/15.jpg)
1. Run-time Queue Management
Prefetch Queues Prior to Being Scheduled
Drop Tuples from Queues to Improve QoS
2. Connection Point Management
Support Efficient (Pull-Based) Access to Historical Data E.g., indexing, sorting, clustering, …
Runtime OperationStorage Management
![Page 16: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/16.jpg)
Talk Outline
1. Introduction
2. Aurora Overview
3. Runtime Operation
4. Query Optimization and Adaptivity
5. Conclusions
![Page 17: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/17.jpg)
Stream Query Optimization
• Differences with Traditional Query Optimization?
![Page 18: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/18.jpg)
Stream Query Optimization• New classes of operators (windows) may mean new
rewrites• New execution modes (continuous/pipelining)• More dynamic fluctuations in statistics compile
time optimization not possible• Global optimization not practical; as huge query
networks Adaptive optimization.• Other cost models taking memory into account, not
throughput but output rate, etc.• Query optimization and load shedding
![Page 19: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/19.jpg)
Query Optimization
Compile-time, Global Optimization Infeasible Too Many Boxes Too Much Volatility in Network, Data
Dynamic, Local OptimizationThreshold re when to optimize
![Page 20: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/20.jpg)
Motivation of ‘Query Migration’
• Continuous query over streams– Statistics unknown before start– Statistics changing during execution
• Stream rates, arrival pattern, distribution, etc
• Need for dynamic adaptation– Plan re-optimization
• Change the shape of query plan tree
![Page 21: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/21.jpg)
Run-time Plan Re-Optimization
• Step 1 - Decide when to optimize– Statistics Monitoring
• Step 2 – Generate new query plan– Query Optimization
• Step 3 – Replace current plan by new plan– Plan Migration
![Page 22: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/22.jpg)
Adaptivity in Query Optimization
Dynamic Optimization : Migration
3. Drain Subnetwork4. Optimize Subnetwork5. Turn on Taps
1. Identify Subnetwork2. Buffer Inputs
![Page 23: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/23.jpg)
Stateful Operator in CQ
• Why stateful– Need non-blocking operators in CQ– Operator needs to output partial results– State data structure keep received tuples
AB
A B
b1b2b3b4b5
ax
State A State B
ax
ax b2ax b3
Key Observation: The purge of tuples in states relies on processing of new tuples.
Example: Symmetric NL join w/ window constraints
![Page 24: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/24.jpg)
Naïve Migration Strategy Revisited
• Steps(1) Pause execution of old plan(2) Drain out all tuples inside old plan(3) Replace old plan by new plan(4) Resume execution of new plan
AB
BC
A B C(2)
All tuples drained
(4)Processing
Resumed
(3) Old Replaced
By new
Deadlock Waiting Problem:
![Page 25: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/25.jpg)
AdaptivityQuery Optimization
State Movement ProtocolParallel Track Protocol
![Page 26: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/26.jpg)
Moving State Strategy
• Basic idea– Share common states
between two migration boxes
• Key steps– State Matching
• Match states based on IDs.– State Moving
• Create new pointers for matched states in new box
– What’s left?• Unmatched states in new
box
CDSABC SD
BCSAB SC
ABSA SB
ABSA SBCD
CDSBC
SD
BCSB SC
QA QB QC QD QA QB QC QD
QABCD QABCD
Old Box New Box
![Page 27: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/27.jpg)
Parallel Track Strategy
• Basic idea– Execute both plans
in parallel and gradually “push” old tuples out of old box by purging
• Key steps– Connect boxes– Execute in parallel
• Until old box “expired” (no old tuple or sub-tuple)
– Disconnect old box– Start execute new
box only
CD
SABC SD
BC
SAB SC
AB
SA SB
ABSA
SBCD
CD
SBC SD
BCSB SC
QA QB QC QD
QA QB QC QD
QABCD QABCD
![Page 28: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/28.jpg)
1. Two Load Shedding Techniques:• Random Tuple Drops
Add DROP box to network (DROP a special case of FILTER)Position to affect queries w/ tolerant delivery-based QoS reqts
• Semantic Load SheddingFILTER values with low utility (acc to value-based QoS)
2. Triggered by QoS Monitore.g., after Latency Analysis reveals certain applications
are continuously receiving poor QoS
AdaptivityLoad Shedding
![Page 29: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/29.jpg)
AdaptivityDetecting Overload
Throughput Analysis
Cost = cSelectivity = s
Input rate = r Output rate = min (1/c, r) * s
1/c > r Problem
C,SI O
P
C,SI O
PC,S
I O
PC,S
I O
P
C,SI O
PC,S
I O
P
C,SI O
PC,S
I O
PC,S
I O
P
Monitor each application’s Delay-based QoS
Problem: Too many apps in “bad zone”
Latency Analysis
![Page 30: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/30.jpg)
ImplementationGUI
![Page 31: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/31.jpg)
ImplementationRuntime
0 1 2 3 4 56
![Page 32: Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown](https://reader035.vdocuments.mx/reader035/viewer/2022062401/5a4d1b6e7f8b9ab0599b4667/html5/thumbnails/32.jpg)
ConclusionsAurora Stream Query Processing System
1. Designed for Scalability
2. QoS-Driven Resource Management
3. Continuous and Historical Queries
4. Stream Storage Management
5. Implemented Prototype
See Stream Web site at Brown Univ.