adaptive ordering of pipelined stream filters babu, motwani, munagala, nishizawa, and widom sigmod...

Adaptive Ordering of Pipelined Stream Filters

Babu, Motwani, Munagala, Nishizawa, and WidomSIGMOD 2004 Jun 13-18, 2004

presented by

Joshua LeeMingzhu Wei

Some slides are adapted from the presentation slides in sigmod2004

Presentation Overview

• Motivation• A-GREEDY Algorithm• INDEPENDENT Algorithm• SWEEP Algorithm• LOCALSWAPS Algorithm• Adaptive Ordering of Multiway

Joins Over Streams• Conclusion

Motivation

• Many queries on stream data involve processing commutative sets of filters

• Each filter is a stateless predicate that indicates whether a tuple should be further processed

• A tuple is output only if it satisfies all filters• Applying filters in different orders can affect the

processing time of the query• The changing nature of stream data requires

adaptive algorithms for filter ordering

Motivation: Example

• Consider the following example usage of a stream processing system:– A network traffic monitoring application built

on a stream engine is used to monitor the amount of common traffic flowing through four routers, R1, R2, R3, and R4 over the last ten minutes.

Motivation: Filter Ordering• If the filters are dependent,

then the order of probing affects processing time

• For instance, it could be known that most traffic through R1 comes from R3 or R4, but rarely from R2

• For incoming data on R1, probing W2 first would drop a tuple quickest, thus saving the time of probing W3 and W4

Motivation: Filter Ordering Challenges

• Filter selectivities may be correlated• The candidate filter orderings to consider

for N filters is N!, which increases quickly• Attributes of stream data and arrival

characteristics can change over time

Motivation: Adaptive Filter Ordering Challenges

• Run-time overhead incurred for continuous monitoring of statistics

• Convergence properties required to prove an adaptive algorithm generates an approximation to correct answer

• Speed of adaptivity in the face of changing data characteristics

• There is a three-way tradeoff: Algorithms that adapt quickly and have good convergence properties incur a lot of run-time overhead

A-GREEDY Algorithm

• The cost of a query plan ordering is

• ti is the cost of processing one tuple in filter i

• d(j|j-1) is the conditional probability that filter j will drop a tuple, given that no filter from 1 to j-1 dropped it

A-GREEDY Algorithm: Static Version and Greedy Invariant

• Static greedy algorithm1. Choose the filter with highest unconditional drop

probability d(i|0) as the first filter2. Choose the filter with the highest conditional drop

probability d(j|1) as the next filter3. Choose the filter with highest conditional drop

probability d(k|2) as the next filter4. And so on

• Greedy Invariant occurs when a filter ordering satisfies the following

A-GREEDY Algorithm: Profiler• Two logical components of A-Greedy:

– profiler: continuously collects and maintains statistics about filter selectivities and processing costs

– Reoptimizer: detects and corrects violations of the GI in the current filter ordering

• Challenge faced by the profiler: n2n-1 conditional

selectivities for n filters• Solution: profile of recently dropped tuples

A-GREEDY Algorithm: Profiler• Profile: a sliding window of profile tuples• A profile tuple contains n boolean attributes b1,…,bn

corresponding to n filters• Dropped tuples are sampled with some probability p• If a tuple e is chosen for profiling, it will be tested by all

remaining filters• A new profile tuple e: bi = 1 if Fi drop e, otherwise bi=0

A-GREEDY Algorithm: Reoptimizer

• The reoptimizer maintains an ordering O such that O satisfies the GI

• How does the reoptimizer use profile to derive estimates of conditional selectivities?– It incrementally maintains a view over the

profile window

A-GREEDY Algorithm: Reoptimizer

• Matrix: View over the profile window: n×n • V[i, j]: number of tuples in the profile

window that were dropped by Ff(j) but not dropped by Ff (1) Ff (2),…, F f (i-1)

V[i, j] is proportional to d (j|i-1)

A-GREEDY Algorithm: Run-time Overhead and Adaptivity

• Profile-tuple creation: needs additional n-I evaluations for a tuple dropped by Ff(i) .

• Profile-window maintenance: insertion and deletion of profile tuples, averages of filter processing times

• Matrix-view update: every update would cause access to up to n2/4 entries

• Detection and correction of GI violations• Good convergence properties • Fast adaptivity

• Can we sacrifice some of A-Greedy’s convergence properties or adaptivity speed to reduce its run-time overhead?

• Three variants of A greedy are proposed

Variants of A Greedy

• Proceeds in stages• During one stage, only checks for GI violations involving the filter at one

specific position j• Do not need to maintain entire matrix view: bf (1) ,…,b f (j) • For each profiled tuple, needs to additionally evaluate F f (j) only• By rotating j over 2,…,n, eventually detects and corrects all GI violations

• Same convergence properties as A-Greedy• Reduced view and need for additional evaluations

less overhead• slower adaptivity: only 1 filter is profiled in each stage

SWEEP Algorithm

INDEPENDENT Algorithm• Assumes filters are independent we only need to maintain

estimates of unconditional selectivities• Lower view maintenance overhead• Fast adaptivity• Convergence: if assumption holds, optimal. Otherwise,

can be O(n) times worse than GI orderings

LOCALSWAP Algorithm• Monitors “local” violations only, i.e., violations involving

adjacent filters• LocalSwaps detects situations where a swap between

adjacent filters• Only need to maintain two diagonals of the view• For each profiled tuple dropped by Ff(i), only needs to

additionally evaluate F f(i+1)

x x

x x

x x

x

Adaptive Ordering of MultiwayJoins Over Streams

• MJoins maintain an ordering of {S0, S1,…, Sn-1}-{Si} for each stream Si

• New tuples arriving from Si is joined with other stream windows in that order

• Two-phase join algorithm– Drop-probing phase: the new tuple is used to probe all other

windows in the specified order. If any window drops it, no further processing will be needed for it

– output-generation phase: if no window drops the tuple, proceeds as conventional MJoins

• Drop probing resembles pipelined filters• A-Greedy and its variants can be used to determine the

orderings

Conclusion

• A-Greedy has good convergence properties and fast adaptivity, but incurs significant run-time overhead

• Three variants of A-Greedy are proposed, each lying at a different points along the tradeoff spectrum among convergence, runtime overhead, and speed of adaptivity

adaptive ordering of pipelined stream filters babu, motwani, munagala, nishizawa, and widom sigmod...

Documents