adaptive ordering of pipelined stream filters babu, motwani, munagala, nishizawa, and widom sigmod...
DESCRIPTION
Motivation Many queries on stream data involve processing commutative sets of filters Each filter is a stateless predicate that indicates whether a tuple should be further processed A tuple is output only if it satisfies all filters Applying filters in different orders can affect the processing time of the query The changing nature of stream data requires adaptive algorithms for filter orderingTRANSCRIPT
Adaptive Ordering of Pipelined Stream Filters
Babu, Motwani, Munagala, Nishizawa, and WidomSIGMOD 2004 Jun 13-18, 2004
presented by
Joshua LeeMingzhu Wei
Some slides are adapted from the presentation slides in sigmod2004
Presentation Overview
• Motivation• A-GREEDY Algorithm• INDEPENDENT Algorithm• SWEEP Algorithm• LOCALSWAPS Algorithm• Adaptive Ordering of Multiway
Joins Over Streams• Conclusion
Motivation
• Many queries on stream data involve processing commutative sets of filters
• Each filter is a stateless predicate that indicates whether a tuple should be further processed
• A tuple is output only if it satisfies all filters• Applying filters in different orders can affect the
processing time of the query• The changing nature of stream data requires
adaptive algorithms for filter ordering
Motivation: Example
• Consider the following example usage of a stream processing system:– A network traffic monitoring application built
on a stream engine is used to monitor the amount of common traffic flowing through four routers, R1, R2, R3, and R4 over the last ten minutes.
Motivation: Filter Ordering• If the filters are dependent,
then the order of probing affects processing time
• For instance, it could be known that most traffic through R1 comes from R3 or R4, but rarely from R2
• For incoming data on R1, probing W2 first would drop a tuple quickest, thus saving the time of probing W3 and W4
Motivation: Filter Ordering Challenges
• Filter selectivities may be correlated• The candidate filter orderings to consider
for N filters is N!, which increases quickly• Attributes of stream data and arrival
characteristics can change over time
Motivation: Adaptive Filter Ordering Challenges
• Run-time overhead incurred for continuous monitoring of statistics
• Convergence properties required to prove an adaptive algorithm generates an approximation to correct answer
• Speed of adaptivity in the face of changing data characteristics
• There is a three-way tradeoff: Algorithms that adapt quickly and have good convergence properties incur a lot of run-time overhead
A-GREEDY Algorithm
• The cost of a query plan ordering is
• ti is the cost of processing one tuple in filter i
• d(j|j-1) is the conditional probability that filter j will drop a tuple, given that no filter from 1 to j-1 dropped it
A-GREEDY Algorithm: Static Version and Greedy Invariant
• Static greedy algorithm1. Choose the filter with highest unconditional drop
probability d(i|0) as the first filter2. Choose the filter with the highest conditional drop
probability d(j|1) as the next filter3. Choose the filter with highest conditional drop
probability d(k|2) as the next filter4. And so on
• Greedy Invariant occurs when a filter ordering satisfies the following
A-GREEDY Algorithm: Profiler• Two logical components of A-Greedy:
– profiler: continuously collects and maintains statistics about filter selectivities and processing costs
– Reoptimizer: detects and corrects violations of the GI in the current filter ordering
• Challenge faced by the profiler: n2n-1 conditional
selectivities for n filters• Solution: profile of recently dropped tuples
A-GREEDY Algorithm: Profiler• Profile: a sliding window of profile tuples• A profile tuple contains n boolean attributes b1,…,bn
corresponding to n filters• Dropped tuples are sampled with some probability p• If a tuple e is chosen for profiling, it will be tested by all
remaining filters• A new profile tuple e: bi = 1 if Fi drop e, otherwise bi=0
A-GREEDY Algorithm: Reoptimizer
• The reoptimizer maintains an ordering O such that O satisfies the GI
• How does the reoptimizer use profile to derive estimates of conditional selectivities?– It incrementally maintains a view over the
profile window
A-GREEDY Algorithm: Reoptimizer
• Matrix: View over the profile window: n×n • V[i, j]: number of tuples in the profile
window that were dropped by Ff(j) but not dropped by Ff (1) Ff (2),…, F f (i-1)
V[i, j] is proportional to d (j|i-1)
A-GREEDY Algorithm: Run-time Overhead and Adaptivity
• Profile-tuple creation: needs additional n-I evaluations for a tuple dropped by Ff(i) .
• Profile-window maintenance: insertion and deletion of profile tuples, averages of filter processing times
• Matrix-view update: every update would cause access to up to n2/4 entries
• Detection and correction of GI violations• Good convergence properties • Fast adaptivity
• Can we sacrifice some of A-Greedy’s convergence properties or adaptivity speed to reduce its run-time overhead?
• Three variants of A greedy are proposed
Variants of A Greedy
• Proceeds in stages• During one stage, only checks for GI violations involving the filter at one
specific position j• Do not need to maintain entire matrix view: bf (1) ,…,b f (j) • For each profiled tuple, needs to additionally evaluate F f (j) only• By rotating j over 2,…,n, eventually detects and corrects all GI violations
• Same convergence properties as A-Greedy• Reduced view and need for additional evaluations
less overhead• slower adaptivity: only 1 filter is profiled in each stage
SWEEP Algorithm
INDEPENDENT Algorithm• Assumes filters are independent we only need to maintain
estimates of unconditional selectivities• Lower view maintenance overhead• Fast adaptivity• Convergence: if assumption holds, optimal. Otherwise,
can be O(n) times worse than GI orderings
LOCALSWAP Algorithm• Monitors “local” violations only, i.e., violations involving
adjacent filters• LocalSwaps detects situations where a swap between
adjacent filters• Only need to maintain two diagonals of the view• For each profiled tuple dropped by Ff(i), only needs to
additionally evaluate F f(i+1)
x x
x x
x x
x
Adaptive Ordering of MultiwayJoins Over Streams
• MJoins maintain an ordering of {S0, S1,…, Sn-1}-{Si} for each stream Si
• New tuples arriving from Si is joined with other stream windows in that order
• Two-phase join algorithm– Drop-probing phase: the new tuple is used to probe all other
windows in the specified order. If any window drops it, no further processing will be needed for it
– output-generation phase: if no window drops the tuple, proceeds as conventional MJoins
• Drop probing resembles pipelined filters• A-Greedy and its variants can be used to determine the
orderings
Conclusion
• A-Greedy has good convergence properties and fast adaptivity, but incurs significant run-time overhead
• Three variants of A-Greedy are proposed, each lying at a different points along the tradeoff spectrum among convergence, runtime overhead, and speed of adaptivity