the query mesh project: a powerful multi-route query processing paradigm

21
The Query Mesh Project: A Powerful Multi-Route Query Processing Paradigm New England Database Summit 2010 Elke. A. Rundensteiner Worcester Polytechnic Institute [email protected] Elisa Bertino Purdue University [email protected] 1 Rimma V. Nehme Microsoft Jim Gray Systems Lab [email protected] Thanx goes to NSF 0917017 for partial support of this project.

Upload: oma

Post on 24-Feb-2016

69 views

Category:

Documents


1 download

DESCRIPTION

The Query Mesh Project: A Powerful Multi-Route Query Processing Paradigm. Elisa Bertino Purdue University [email protected]. Rimma V. Nehme Microsoft Jim Gray Systems Lab [email protected]. Elke. A. Rundensteiner Worcester Polytechnic Institute [email protected]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

1

The Query Mesh Project: A Powerful Multi-Route Query Processing

Paradigm

New England Database Summit 2010

Elke. A. RundensteinerWorcester Polytechnic Institute

[email protected]

Elisa BertinoPurdue University

[email protected]

1

Rimma V. NehmeMicrosoft Jim Gray

Systems [email protected]

Thanx goes to NSF 0917017 for partial support of this project.

Page 2: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

2

Motivation A variety of modern applications face data with non-uniform

characteristics ubiquitous healthcare, location-based services, financial tickers, network

monitoring…

Data

Query Results

Data Sources Database Engine

SELECT * FROM …

Query Optimizer

Plan Cost1.234

Query

Query Execution Plan

Query

Executor

Ove

rall

Stat

istic

s

I want my results quickly. I don’t

care how exactly they are computed

TYPICALLY ONE

execution plan for

ALL DATA

2

Page 3: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

3

Concrete Example: Network Monitoring

Data Streams

Query Results

Network packets

DSMS

SELECT * FROM …

Query Optimizer

Continuous Query

Query Execution Plan

Network Monitoring

Multi-Plan (/Route) Query ProcessingPlan 1 Plan 2 Plan 3

Single Plan Query ProcessingOpportunity for Improvement:

It may be more efficient to use different plans for different subsets

of data

3

• Here example is with streaming data• Similar examples can be found with static data

Page 4: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

4

Outline Introduction & Motivation Background : Query Mesh

Model Optimization Execution

Dynamic Re-Optimization with Query Mesh Challenges Architecture Details Experimental Evaluation

Ongoing and future work Conclusion

4

Page 5: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

5(Here, route = execution plan)

Multi-Plan Query Processing Using Query Mesh

Query Mesh provides a middle ground between a single pre-computed route and multiple runtime routes systems

Single “route-oriented” solution

Multiple routesClassifier

Traditional Query Optimization Eddies and its descendants

Multi “route-less” solution

Eddy

Query Mesh………

Multi “route-oriented” solution

Coarse optimizationSmall overhead

Fine-granularity optim.Significant overhead

Fine-granularity optimizationLess overhead

Physical Architecture of Query Mesh Framework

5

Page 6: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

6

Query Mesh Search Space

1234

1/2/3/4

1/23/4 14/2/3 1/24/3 13/2/4 12/3/4 1/2/34

14/23 1/234 124/3 13/24 123/4 134/2 12/34

Set of training tuples {1,2,3,4}* has cardinality n = 4

* We denote {{1},{2,3}} as “1/23” for brevity

One plan for all data

Each subset has individual route

Query MeshLattice ShapedSearch Space

6

Search Space: the set of all possible solutions

Search Space ComplexityBell number Bn = sum of Stirling numbers of second

kind S(n,k)

Stirling number of the second kind S(n, k) is the number of ways to partition a set of cardinality n

into exactly k nonempty subsets

Page 7: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

7

Query Mesh Optimization Problem

7

Query Mesh Cost Model(main idea)

Cost(QM) = Cost of Classifier + Cost of routes + Multi-route overhead

Query Mesh Search Algorithms

Optimal Query Mesh Search (Opt-QM)

Query Mesh Search Heuristics

Start solution

Final solution

= explored solutions

Three components of search heuristics: (1) Start Solution 5 different approaches - extreme-1, extreme-N, random, content-driven, route-driven Experimentally evaluated (2) Search Strategy Randomized algorithms -Iterative Improvement - Simulated annealing (3) Stop condition Largely depends on the search strategy employed -K-iterations, Plateau, Time-bounded, Resource-boundedToo expensive! Need heuristics!

(1) Form all possible sets for the given powerset

(2 ) Form partitions out of the above sets

Main idea:

Page 8: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

8

Query Mesh Optimization Overview

Sample of Tuples(training dataset)

t10 t9 t8 t7 t6 t5 t4 t3 t2 t1t11t12…

Data Stream

Query Executor

Query Optimizer

… samplesamplesampleand so on Compute Routes (i.e., plans)

Query Mesh…………

Induce Classifierr3

r4

r2r1

r1 r2 r4

- QM Optimizer- QM Executor

8[NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology, (Demo) In VLDB 2009.

Page 9: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

Query Mesh Execution Overview

Classification Window(tumbling window)

t5 t4 t3 t1

t9 t6 t2

t10 t8 t7

After Classification

route r1

route r2

route r3

t10 t9 t8 t7 t6 t5 t4 t3 t2 t1t11t12…

<1,4,3,2><2,4,3,1>

<3,4,1,2>

r-tokensdata tuples

rusters

Send to Self-Routing

Fabric

Data Stream

Query Executor

Query Optimizer

- QM Optimizer

- QM Executor

9[NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology (Demo), In VLDB 2009.

Page 10: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

10

But… data characteristics may change…

At time T + 1

At time T + 2 At time T + 3

At time T

10

Page 11: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

11

Can we have an execution strategy that

Dynamic Re-optimization with Query Mesh

is plan-basedsupports different plans for distinct subsets of datais as adaptive “as Eddies”

Self-Tuning Query Mesh (ST-QM)

11[NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.

Page 12: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

12

Outline Introduction & Motivation Background : Query Mesh

Model Optimization Execution

Dynamic Re-Optimization with Query Mesh Challenges Architecture Details

Conclusion Current and Future Work

12

Page 13: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

13

Challenges

Multiple routesClassifier

Query Mesh………

1. What should be monitored to determine whether the current QM solution is no longer adequate?

2. How to determine if the current QM solution should be adapted?

3. How to efficiently execute the physical migration from the current QM to a new QM solution while the query is being executed?

Concept Drift Analysis, QM Cost Model, Improvement Measure

Data and Statistics Monitoring

Single Lightweight Operation to Physically Adapt QM

..Self-Tuning Query Mesh

………

Contributions

13[NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.

Page 14: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

14

ST-QM Architecture

Static QM Framework

Query Executor

Query Optimiz

er

Query Executor

Query Optimiz

er

ST-QM

Adaptive QM Framework

14[NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.

Page 15: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

15

ST-QM Monitor continuously samples data and execution statistics that will be used to determine if a concept drift has occurred (i.e., QM needs to be adapted)

ST-QM Analyzer determines if a concept drift has actually occurred and makes recommendations if and how the QM solution should be adapted

ST-QM Actuator takes these recommendations and physically adapts the QM solution

ST-QM Components

ST-QMMonitor

ST-QMAnalyzer

ST-QMActuator

measurements recommendations

actuationsampling

15

Query Mesh

ST-QM

NewQuery Mesh

Page 16: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

17

Classifier Modification

Query Mesh

………

Query Mesh

………

Query Mesh

………

R1 New Classifier + Old Routes

R2 Old Classifier + New Routes

R3 New Classifier + New Routes

ST-QM Actuator: Physical Query Mesh AdaptationAll possible recommendations:

Case 1: Virtual Concept Drift RecommendationCase 2: Real Concept Drift RecommendationCase 3: Hybrid Concept Drift Recommendation

1234

0

Query results

OI-arrayOp-modules

opi

opi

opk

opl

Self-Routing Fabric

Data

r1

r2

r3

r1

r2

r3

Online Classifier

rusters

rusters

CurrentClassifier

NewClassifier

The beauty of

the proposed design!!!

17

Page 17: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

18

Experimental Evaluation

ST-QM was implemented inside Java-based continuous query engine called CAPE

Compare its relative performance against competitor systems, namely, we compared adaptive QM against: Static (non-adaptive) QM, Adaptive “plan-less” Eddies Adaptive “plan-less” Eddies with CBR-based routing policy

Results can be found in EDBT’ 2010.

18

Page 18: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

19

Summary of ST-QM Experimental Results

ST-QM gave up to 44% improvement in execution time and output rate compared to non-adaptive QM, Eddy and single plan execution approach

The runtime overhead of ST-QM relative to query execution is small (on average 2%).

The actuation cost of physical adaptivity is nearly negligible resulting in 0.02% of total execution cost

Even if no adaptivity is needed, ST-QM’s performance in the worst case will be at most 2-3% slower than static QM

19

Page 19: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

20

Conclusion• Query Mesh is practical query optimization

approach Eliminates single plan assumption Feasibility shown Has low overhead & high potential benefit Easily implemented and integrated with existing

systems

• Query Mesh leads to novel solutions Usage of machine learning in query optimization and

query processing Usage of network-inspired techniques in query

optimization and query processing20

Page 20: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

21

Next Steps in QM Project

• Consider state caching and indexing in QM stream context

• Work with alternate classification methods for route decisions

• Design customized query optimization and processing strategies

• Study multi-query processing and optimization

• Scale by applying distributed processing technologies

• Do QM principles also apply in static DB context !?

21

Page 21: The Query Mesh Project:  A Powerful Multi-Route Query Processing Paradigm

22

Thank You for Listening !!!!!

22

Thank you to current and past DSRG members for stream engine development, feedback, collaboration, and much more.