monitoring streams -- a new class of data management applications don carney brown university uğur...

Post on 22-Dec-2015

218 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Monitoring Streams -- A New Class of Data Management Applications

Don Carney Brown University

Uğur Çetintemel Brown University

Mitch Cherniack Brandeis University

Christian Convey Brown University

Sangdon Lee Brown University

Greg Seidman Brown University

Michael Stonebraker MIT

Nesime Tatbul Brown University

Stan Zdonik Brown University

Background

• MIT/Brown/Brandeis team• First Aurora, then Borealis

– Practical system– Designed for Scalablility: 106 stream inputs, queries– QoS-Driven Resource Management – Stream Storage Management – Realiability/ Fault Tolerance– Distribution and Adaptivity

• First stream startup: StreamBase– Financial applications

Example Stream Applications

• Market Analysis– Streams of Stock Exchange Data

• Critical Care– Streams of Vital Sign Measurements

• Physical Plant Monitoring– Streams of Environmental Readings

• Biological Population Tracking– Streams of Positions from Individuals of a Species

Not Your Average DBMS

1. External, Autonomous Data Sources

2. Querying Time-Series

3. Triggers-in-the-large

4. Real-time response requirements

5. Noisy Data, Approximate Query Results

Outline

2. Aurora Overview/ Query Model

3. Runtime Operation

4. Adaptivity

Aurora from 100,000 Feet

Query App QoS...

...

Query App QoS

...

Query App QoS

...

...

...

...

Each Provides:

• A over input data streams

• A Quality-Of-Service Specification ( )(specifies utility of partial or late results)

Application

Query

QoS

Aurora from 100 Feet

App QoS...

...

App QoS

...

App QoS

...

...

Queries = Workflow (Boxes and Arcs)

• Workflow Diagram = “Aurora Network”

• Boxes = Query Operators

• Arcs = Streams

Slide

Tumble

Streams (Arcs)

• stream: tuple sequence from common source

(e.g., sensor)

• tuples timestamped on arrival (Internal use: QoS)

Query Operators (Boxes)

• Simple: FILTER, MAP, RESTREAM

• Binary: UNION, JOIN, RESAMPLE

• Windowed: TUMBLE, SLIDE, XSECTION, WSORT

Aurora in Action

App QoS...

...

App QoS

...

App QoS

...

...

Slide

Tumble

App

TumbleTumble App

“Box-at-a-time” Scheduling

Arcs Tuple Queues

Outputs Monitored for QoS

Continuous and Historical Queries

ad-hoc query

O4

O5

QoS

App…

O1 O3O2

continuous query

QoS

App… …Queues

O7O8 O9

view3 Days

QoS… …

ConnectionPoint

1 Hour

Quality-of-Service (QoS)

Output Value

Specifies “Utility” Of Imperfect Query ResultsDelay-Based (specify utility of late results)Delivery-Based, Value-Based (specify utility of partial results)

QoS Influences…

Scheduling, Storage Management, Load Shedding

% Tuples Delivered

B

Delay

A C

Talk Outline

1. Introduction

2. Aurora Overview

3. Runtime Operation

4. Adaptivity

5. Related Work and Conclusions

Runtime OperationBasic Architecture

Scheduler

QOSMonitor

Box Processors

.

.

.

Buffer

Storage Manager

Persistent Store

…q1…q2

…qi

…q1

…qn

.

.

.

…q2

...

.

.

.

Catalog

Router

inputs outputs

Runtime OperationScheduling: Maximize Overall QoS

Choice 1: A: Cost: 1 sec(…, age: 1 sec)

B: Cost: 2 sec(…, age: 3 sec)

Delay = 2 secUtility = 0.5

Delay = 5 secUtility = 0.8

Schedule Box A now rather than later

Ideal: Maximize Overall Utility Presently exploring scalable heuristics (e.g., feedback-based)

Choice 2:

Runtime OperationScheduling: Minimizing Per Tuple Processing Overhead

Train Scheduling:

A B… xyz A (x)A (y)A (z) B (A (x))B (A (y))B (A (z))

Default Operation: = Context Switch

AB… xyz B (A (x))B (A (y))B (A (z))Box Trains:

A B… xyz A (z, y, x) B (A (z), A (y), A (x))Tuple Trains:

1. Run-time Queue Management

Prefetch Queues Prior to Being Scheduled

Drop Tuples from Queues to Improve QoS

2. Connection Point Management

Support Efficient (Pull-Based) Access to Historical Data E.g., indexing, sorting, clustering, …

Runtime OperationStorage Management

Talk Outline

1. Introduction

2. Aurora Overview

3. Runtime Operation

4. Adaptivity

5. Related Work and Conclusions

Stream Query Optimization

• Differences with Traditional Query Optimization?

Motivation of ‘Query Migration’

• Continuous query over streams– Statistics unknown before start– Statistics changing during execution

• Stream rates, arrival pattern, distribution, etc

• Need for dynamic adaptation– Plan re-optimization

• Change the shape of query plan tree

Stream Query Optimization• New classes of operators (windows) may mean

new rewrites• New execution modes (continuous/pipelining)• More dynamic fluctuations in statistics compile

time optimization not possible• Global optimization not practical; as huge query

networks Adaptive optimization.• Other cost models taking memory into account, not

throughput but output rate, etc.• Query optimization and load shedding

Query Optimization

Compile-time, Global Optimization Infeasible

Too Many Boxes

Too Much Volatility in Network, Data

Dynamic, Local OptimizationScope re what to optimize

Threshold re when to optimize

Run-time Plan Re-Optimization

• Step 1 - Decide when to optimize– Statistics Monitoring

• Step 2 – Generate new query plan– Query Optimization

• Step 3 – Replace current plan by new plan– Plan Migration

Adaptivity in Query Optimization

Dynamic Optimization : Migration

3. Drain Subnetwork4. Optimize Subnetwork5. Turn on Taps

1. Identify Subnetwork2. Buffer Inputs

Stateful Operator in CQ

• But what about stateful operators ?– Need non-blocking operators in CQ– Operator needs to output partial results– State data structure keep received tuples

AB

A B

b1b2b3b4b5

ax

State A State B

ax

ax b2ax b3

Key Observation: The purge of tuples in states relies on processing of new tuples.

Example: Symmetric NL join w/ window constraints

Naïve Migration Strategy Revisited

• Steps(1) Pause execution of old plan(2) Drain out all tuples inside old plan(3) Replace old plan by new plan(4) Resume execution of new plan

AB

BC

A B C(2)

All tuples drained

(4)Processing

Resumed

(3) Old Replaced

By new

Deadlock Waiting Problem:

AdaptivityQuery Optimization

State Movement Protocol

Parallel Track Protocol

Moving State Strategy

• Basic idea– Share common states

between two migration boxes

• Key steps– State Matching

• Match states based on IDs.– State Moving

• Create new pointers for matched states in new box

– What’s left?• Unmatched states in new

box

CDSABC SD

BCSAB SC

ABSA SB

ABSA SBCD

CDSBC

SD

BCSB SC

QA QB QC QD QA QB QC QD

QABCD QABCD

Old Box New Box

Parallel Track Strategy

• Basic idea– Execute both plans

in parallel and gradually “push” old tuples out of old box by purging

• Key steps– Connect boxes– Execute in parallel

• Until old box “expired” (no old tuple or sub-tuple)

– Disconnect old box– Start execute new

box only

CD

SABC SD

BC

SAB SC

AB

SA SB

AB

SASBCD

CD

SBC SD

BCSB SC

QA QB QCQD

QA QB QC QD

QABCD QABCD

1. Two Load Shedding Techniques:• Random Tuple Drops

Add DROP box to network (DROP a special case of FILTER)Position to affect queries w/ tolerant delivery-based QoS reqts

• Semantic Load SheddingFILTER values with low utility (acc to value-based QoS)

2. Triggered by QoS Monitor

e.g., after Latency Analysis reveals certain applications are continuously receiving poor QoS

AdaptivityLoad Shedding

AdaptivityDetecting Overload

Throughput Analysis

Cost = cSelectivity = s

Input rate = r Output rate = min (1/c, r) * s

1/c > r Problem

C,SI O

P

C,SI O

P

C,SI O

P

C,SI O

P

C,SI O

P

C,SI O

P

C,SI O

P

C,SI O

P

C,SI O

P

Monitor each application’s Delay-based QoS

Problem: Too many apps in “bad zone”

Latency Analysis

ImplementationGUI

ImplementationRuntime

0 1 2 3 4 56

ConclusionsAurora Stream Query Processing System

1. Designed for Scalability

2. QoS-Driven Resource Management

3. Continuous and Historical Queries

4. Stream Storage Management

5. Implemented Prototype

Web site: www.cs.brown.edu/research/aurora/

top related