maintaining time-decaying stream aggregates

23
PODS 2003 Maintaining Time- Decaying Stream Aggregates Edith Cohen Martin Strauss AT&T Labs-research

Upload: juliet

Post on 30-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Maintaining Time-Decaying Stream Aggregates. Edith Cohen Martin Strauss AT&T Labs-research. The Problem. A data stream is a sequence of data items observed over time. Presence of multiple massive data streams. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Maintaining Time-Decaying Stream

AggregatesEdith Cohen Martin Strauss

AT&T Labs-research

Page 2: Maintaining Time-Decaying Stream Aggregates

PODS 2003

The Problem• A data stream is a sequence of data items

observed over time.• Presence of multiple massive data

streams.• Storage constraints allow only to maintain

a compact summary of the “essence” of information in each stream.

• Relevance of information decays with time.• Thus, when aggregating across time, older

information should be discounted.

Page 3: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Applications

• IP routing - RED protocol: time-decayed average of previous queue lengths is used to estimate impending congestion at router

• Internet gateway selection: tracks the quality (eg packet loss rate) of alternative paths to select a more reliable one.

• Usage statistics of phone customers: AT&T has about 100M customers.

• More …..

Page 4: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Decay Functions

• A decay function is non-increasing g(x)>=0 defined for x>=1.

• f(t) >= 0 is the value of the data item observed at time t.

• The weight at time T of an item obtained at time t is g(T-t)

• The decayed value of the item is f(t)g(T-t)

Page 5: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Time-Decaying Sum

• When f(t) are 0/1 we refer to the problem as time-decaying count.

• Maintaining the decaying sum exactly can generally consume linear bits.

• We consider approximately maintaining it to within

Ttg tTgtfTV )()()(

1

Page 6: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Time-Decaying Average

• Time-decaying weighted average of observed values.

• is the value of item observed at time)( itf it

Ttii

Ttiii

g

i

i

tTg

tTgtf

TA

|

|

)(

)()(

)(

Maintaining time-decaying average reduces to maintaining two time-decaying sums

Page 7: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Interesting Families of Decay Functions

WSliWin• Sliding Windows [DGIM02] g(x)=1 for x<W g(x)=0 otherwise

xxg /1)( PolyD• Polynomial decay

• General Decay functions…

)exp()( xxg ExpD• Exponential decay [Jacobson 88]

Page 8: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Exponential Decay

• Used in networking applications (RED)• Very simple maintenance:

)1()exp()()( tVtftV ExpDExpD

Lemma:• Exact tracking requires storage bits• Approximate tracking uses bits

)(N)(logN

Page 9: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Sliding Window Decay

Lemma: [DGIM02]Sliding window decay can be approximately tracked using bits (for 0/1 or poly size values).

)(log2W

• “Sharp Threshold” • Upper bound using the Exponential Histogram (EH) technique.

Page 10: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Polynomial Decay Lemma: Lower bound: Upper bound: (N is elapsed time)

)loglog(log NNO)(logN

• Often more appropriate to applications than Exponential or Sliding Window decay

• More efficient than SliWin decay (nearly quadratic gap), almost as efficient as Exponential decay.

Page 11: Maintaining Time-Decaying Stream Aggregates

PODS 2003

General Decay Functions• Lemma: Can be (approximately) maintained using bits (N is minimum of elapsed

time and min x for which g(x)=0 ))(log2 NO

• Algorithm based on an adaptation of the Exponential Histograms technique.

• Sliding windows, (with ), [DGIM02] are as “hard” to maintain as general decay

)(log2 N

Page 12: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Why Polynomial Decay?

• Link performance over time

Link A

Link B

Time

good

bad

t0

Which link should we select past time t0?

Initially A or B, eventually B.

Page 13: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Link Selection Example) cont)

• Polynomial decay (by tuning parameter): Initially A or B, eventually B.

• Exponential decay: Constant relative value of A and B: Either A forever or B forever

• Sliding Window decay: First B then A then same…

Poly decay can model our expectation (alsoother smooth subexponential functions…)

Page 14: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Summary of Bounds

functionbound

Exp decay

Poly decay

SliWin decay

General decay

Upper

Lower

)(log2 NO)(log2 NO)loglog(log NNO

)(logN)(logN

)(logNO

)(log2 N

• N is minimum of elapsed time and min x for which g(x)=0

• Approximate to within 1

)(log2 N

Page 15: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Bucketing the StreamTime

1 0 0 0 111 00 1

Time width: 3Count: 1

Time width: 4Count: 2

Time width: 3Count: 2

Time width: 7Count: 4

Merge

• Histogram determined by time boundaries and bucket counts• Time boundaries can be fixed (counts maintained per stream)• Counts can be fixed (time boundaries maintained per stream)

Page 16: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Exponential Histograms [DGIM02]

• Introduced for Sliding Windows• Each new item is placed in a new bucket.• Two buckets are merged when their

combined count is at most a fraction of the combined count of all earlier buckets.

• Buckets with start time greater than W are discarded.

• Bucket counts are independent of stream

• Sum of bucket counts is a constant-factor approximation for

WSliWin

WSliWin

Page 17: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Exponential Histograms (cont)

• Example for factor 2 approximation: (bucket counts)

• 1• 1, 1• 1, 1, 1• 1, 1, 2 (merge)• 1, 1, 1, 2• 1, 1, 2, 2 (merge)

• Values with time “in question” (before or after W) are aggregated in least recent bucket.

Page 18: Maintaining Time-Decaying Stream Aggregates

PODS 2003

EHs properties• Number of buckets is O(log W), for each bucket we need to record exact

start time, thus we need O(log W) storage per bucket. (total is O(log^2 W))

• An EH for Sliding Window W can be used to approximate Sliding Window j for all j<W

Lemma:EH can be used to approximate general decay functions. (With W= minimum of elapsed time and min x for which g(x)=0.)

Page 19: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Reducing any Decay Function to Sliding

Windows.• Decay function g(x)

TtNT

g tTgtfTV )()()(

TtNT

N

i TtiNT

tfiNgiNgtfNg1

1

)())1()(()()(

)())1()(()()(1

1

TSliWiniNgiNgTSliWinNgN

iiNN

From (approximate) for all W<=N we can compute (approximate) decayed sum according to

g.()

WSliWin

With an EH with W=N we can compute (approximately) decayed sums according to all decay functions g() up to elapsed time N (or forever if g(N)=0).

Page 20: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Weight-Based Merging• Bucket start times depend only on elapsed

time.• WBM Histograms applies to decay

functions where g(x)/g(x+1) is non-increasing.

• Number of buckets is O(log(g(1)/g(N))).• O(log log N) storage per bucket (for

approximate bucket counts).• More efficient than EH on decay that is

slightly super-polynomial or slower.• O(log N log log N) storage for polynomial

decay

Page 21: Maintaining Time-Decaying Stream Aggregates

PODS 2003

WBM Histograms – How?

• Region boundaries b1,b2,b3,… :

)1()1()1(maxarg1 gxgb x )()1()1(maxarg 1 ixi bgxgb

• Current most-recent bucket is sealed and new bucket is started at T s.t. T mod b1=0

• Two consecutive buckets that are in the same region (according to elapsed start and end times) are merged.

• At most 2 buckets per region

Page 22: Maintaining Time-Decaying Stream Aggregates

PODS 2003

WBMH Example

• g(x)=1/x, (1+)=2• Regions:1,1/2, 1/3,1/4,1/5,1/6, 1/7,1/8,…,1/14

T=1

T=3

T=4

T=5

T=6

T=2

Page 23: Maintaining Time-Decaying Stream Aggregates

PODS 2003

Conclusion

Summary:

• Efficient computation of time-decayed sum/averages for general decay functions.

• Very efficient computation for polynomial decay

• Open question: O(log n) storage for polynomial decay

• Subsequent related work: Spatial decay (sensor nets/p2p nets)