braid: discovering lag correlations in multiple streams yasushi sakurai (ntt cyber space labs)...
TRANSCRIPT
![Page 1: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/1.jpg)
BRAID: Discovering Lag Correlations in Multiple Streams
Yasushi Sakurai (NTT Cyber Space Labs)
Spiros Papadimitriou (Carnegie Mellon Univ.)
Christos Faloutsos (Carnegie Mellon Univ.)
![Page 2: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/2.jpg)
SIGMOD 2005 Y. Sakurai et al 2
Motivation
Data-stream applications Network analysis Sensor monitoring Financial data analysis Moving object tracking
Goal Monitor multiple numerical streams Determine which pairs are correlated with lags Report the value of each such lag (if any)
![Page 3: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/3.jpg)
SIGMOD 2005 Y. Sakurai et al 3
Lag Correlations Examples
A decrease in interest rates typically precedes an increase in house sales by a few months
Higher amounts of fluoride in the drinking water leads to fewer dental cavities, some years later
High CPU utilization on server 1 precedes high CPU utilization for server 2 by a few minutes
![Page 4: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/4.jpg)
SIGMOD 2005 Y. Sakurai et al 4
Lag Correlations Example of lag-correlated sequences
These sequences are correlated with lag l=1300 time-ticks
CCF (Cross-Correlation Function)
![Page 5: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/5.jpg)
SIGMOD 2005 Y. Sakurai et al 5
Lag Correlations
CCF (Cross-Correlation Function)
Example of lag-correlated sequences Fast
(high performance) Nimble
(Low memory
consumption) Accurate
(good approximation)
![Page 6: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/6.jpg)
SIGMOD 2005 Y. Sakurai et al 6
Problem #1: PAIR of sequences For given two co-evolving sequences X and
Y, determine Whether there is a lag correlation If yes, what is the lag length l
Any time, on semi-infinite streams
? yes;l = 1,300
X
Y
![Page 7: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/7.jpg)
SIGMOD 2005 Y. Sakurai et al 7
Problem #2: k-way
For given k numerical sequences, X1,…,Xk , report Which pairs (if any) have a lag correlation The corresponding lag for such pairs
again, ‘any time’, streaming fashion
? X1 and X2; l = 1,300...
X1
...
X2
Xk
![Page 8: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/8.jpg)
SIGMOD 2005 Y. Sakurai et al 8
Our solution, BRAID
characteristics: ‘Any-time’ processing, and fast
Computation time per time tick is constant Nimble
Memory space requirement is sub-linear of sequence length
AccurateApproximation introduces small error
![Page 9: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/9.jpg)
SIGMOD 2005 Y. Sakurai et al 9
Sequence indexing Agrawal et al. (FODO 1993) Faloutsos et al. (SIGMOD 1994) Keogh et al. (SIGMOD 2001)
Compression (wavelet and random projections) Gilbert et al. (VLDB 2001) Guha et al. (VLDB 2004) Dobra et al.(SIGMOD 2002) Ganguly et al.(SIGMOD 2003)
Related Work
![Page 10: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/10.jpg)
SIGMOD 2005 Y. Sakurai et al 10
Data Stream Management Abadi et al. (VLDB Journal 2003) Motwani et al. (CIDR 2003) Chandrasekaran et al. (CIDR 2003) Cranor et al. (SIGMOD 2003)
Related Work
![Page 11: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/11.jpg)
SIGMOD 2005 Y. Sakurai et al 11
Related Work Pattern discovery
Clustering for data streamsGuha et al. (TKDE 2003)
Monitoring multiple streamsZhu et al. (VLDB 2002)
ForecastingYi et al. (ICDE 2000)Papadimitriou et al. (VLDB 2003)
None of previously published methods focuses on the problem
![Page 12: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/12.jpg)
SIGMOD 2005 Y. Sakurai et al 12
Overview
Introduction / Related work Background Main ideas Theoretical analysis Experimental results
![Page 13: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/13.jpg)
SIGMOD 2005 Y. Sakurai et al 13
Background
CCF (Cross-Correlation Function)
positively correlated
un-correlated
+
anti-correlated(lower than -)
Lag correlation
Lag
Cor
rela
tion
![Page 14: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/14.jpg)
SIGMOD 2005 Y. Sakurai et al 14
Background
Definition of ‘score’, the absolute value of R(l)
Lag correlation Given a threshold , A local maximum The earliest such maximum, if more maxima exist
)()( lRlscore
ln
t t
n
lt t
n
lt ltt
yyxx
yyxxlR
1
2
1
2
1
)()(
))(()(
)(lscore
details
![Page 15: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/15.jpg)
SIGMOD 2005 Y. Sakurai et al 15
Overview
Introduction / Related work Background Main ideas Theoretical analysis Experimental results
![Page 16: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/16.jpg)
SIGMOD 2005 Y. Sakurai et al 16
Why not ‘naive’? Naive solution:
Compute correlation coefficient for each lag
l = 0, 1, 2, 3, …, n/2 But,
O(n) space O(n2) time
or O(n log n) time w/ FFT
t=nTimeLag
Cor
rela
tion
n/20
![Page 17: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/17.jpg)
SIGMOD 2005 Y. Sakurai et al 17
Main Idea (1)
Incremental computing: the correlation coefficient of two sequences is
‘algebraic’ -> can be computed incrementally
we need to maintain only 6 ‘sufficient statistics’: Sequence length n Sum of X, Square sum of X Sum of Y, Square sum of Y Inner-product for X and the shifted Y
![Page 18: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/18.jpg)
SIGMOD 2005 Y. Sakurai et al 18
Main Idea (1) Incremental computing:
Sequence length n Sum of X : Square sum of X : Inner-product for X and the shifted Y :
Compute R(l) incrementally:
Covariance of X and Y:
Variance of X:
n
lt ltt yxlSxy1
)(
n
t txnSx1
),1(
n
t txnSxx1
2),1(
),1(),1(
)()(
lnVynlVx
lClR
ln
lnSynlSxlSxylC
),1(),1(
)()(
ln
nlSxnlSxxnlVx
2)),1((
),1(),1(
details
![Page 19: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/19.jpg)
SIGMOD 2005 Y. Sakurai et al 19
Main Idea (1)
Complexity
Naive Naive
(incremental)
BRAID
Space O(n) O(n)
Comp. time O(n log n) O(n)
Better, but not good enough!
![Page 20: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/20.jpg)
SIGMOD 2005 Y. Sakurai et al 20
Main Idea (2)
Lag
Cor
rela
tion
Geometric lag probing
![Page 21: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/21.jpg)
SIGMOD 2005 Y. Sakurai et al 21
Main Idea (2)
0 1 2 4 8
Lag
Cor
rela
tion
Geometric lag probing ie., compute the correlation coefficient for lag:
l = 0, 1, 2, 4, ... 2h
O(log n) estimations
![Page 22: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/22.jpg)
SIGMOD 2005 Y. Sakurai et al 22
Main Idea (2)
Geometric lag probing
But, so far, we still need O(n) space because the longest lag is n/2
Naive Naive
(incremental)
BRAID
Space O(n) O(n)
Comp. time O(n log n) O(n) O(log n)
![Page 23: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/23.jpg)
SIGMOD 2005 Y. Sakurai et al 23
Main Idea (3)
Lag
Cor
rela
tion
Sequence smoothing
t=nTime
Reminder: Naïve:
![Page 24: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/24.jpg)
SIGMOD 2005 Y. Sakurai et al 24
Main Idea (3)
Lag
Cor
rela
tion
Level
h=0t=nTime
Sequence smoothing Means of windows for each level Sufficient statistics computed from the means CCF computed from the sufficient statistics But, it allows a partial redundancy
![Page 25: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/25.jpg)
SIGMOD 2005 Y. Sakurai et al 25
Putting it all together:
Lag
Cor
rela
tion
Level
h=0t=nTime
Geometric lag probing + smoothing Use colored windows Keep track of only a geometric progression of the
lag values: l={0,1,2,4,8,…,2h,…}
![Page 26: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/26.jpg)
SIGMOD 2005 Y. Sakurai et al 26
Putting it all together:
Geometric lag probing + smoothing Use colored windows Keep track of only a geometric progression of the
lag values: l={0,1,2,4,8,…,2h,…}
Lag
Cor
rela
tion
Level
h=0t=nTime
h=0Y
X
l=0
![Page 27: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/27.jpg)
SIGMOD 2005 Y. Sakurai et al 27
Putting it all together:
Geometric lag probing + smoothing Use colored windows Keep track of only a geometric progression of the
lag values: l={0,1,2,4,8,…,2h,…}
Lag
Cor
rela
tion
Level
h=0t=nTime
h=0Y
X
l=1
![Page 28: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/28.jpg)
SIGMOD 2005 Y. Sakurai et al 28
Putting it all together:
Geometric lag probing + smoothing Use colored windows Keep track of only a geometric progression of the
lag values: l={0,1,2,4,8,…,2h,…}
Lag
Cor
rela
tion
Level
h=1th=n/2Time
h=1Y
X
l=2
![Page 29: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/29.jpg)
SIGMOD 2005 Y. Sakurai et al 29
Putting it all together:
Geometric lag probing + smoothing Use colored windows Keep track of only a geometric progression of the
lag values: l={0,1,2,4,8,…,2h,…}
Lag
Cor
rela
tion
Level
h=2Time
h=2Y
Xth=n/4
l=4
![Page 30: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/30.jpg)
SIGMOD 2005 Y. Sakurai et al 30
Putting it all together:
Geometric lag probing + smoothing Use colored windows Keep track of only a geometric progression of the
lag values: l={0,1,2,4,8,…,2h,…}
Lag
Cor
rela
tion
Level
h=3Time
h=3Y
Xth=n/8
l=8
![Page 31: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/31.jpg)
SIGMOD 2005 Y. Sakurai et al 31
Putting it all together:
Lag
Cor
rela
tion
Level
h=0t=nTime
Geometric lag probing + smoothing Use colored windows Keep track of only a geometric progression of the
lag values: l={0,1,2,4,8,…,2h,…} Use a cubic spline to interpolate
![Page 32: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/32.jpg)
SIGMOD 2005 Y. Sakurai et al 32
Thus:
Complexity
Naive Naive
(incremental)
BRAID
Space O(n) O(n) O(log n)
Comp. time O(n log n) O(n) O(1) *
(*) Computation time: O(logn)And actually, amortized time: O(1)
![Page 33: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/33.jpg)
SIGMOD 2005 Y. Sakurai et al 33
Overview
Introduction / Related work Background Main ideas
enhancing the accuracy Theoretical analysis Experimental results
details
![Page 34: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/34.jpg)
SIGMOD 2005 Y. Sakurai et al 34
Enhanced Probing Scheme
Q: How to probe more densely than 2h ?
Lag
Cor
rela
tion
Level
h=0t=nTime
![Page 35: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/35.jpg)
SIGMOD 2005 Y. Sakurai et al 35
Enhanced Probing Scheme
Q: How to probe more densely than 2h ? A: probe in a mixture of geometric and arithmetic
progressions
Lag
Cor
rela
tion
Level
h=0t=nTime
![Page 36: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/36.jpg)
SIGMOD 2005 Y. Sakurai et al 36
Enhanced Probing Scheme
Basic scheme: b=1 (one number for each level) Enhanced scheme: b>1
Example of b=4 Probing the CCF in a mixture of geometric and arithmetic
progressions: l={0,1,…,7;8,10,12,14;16,20,24,28;32,40,…}
Level
h=0
Time t=n
Cor
rela
tion
Lag
step:1 step: 2 step: 4
![Page 37: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/37.jpg)
SIGMOD 2005 Y. Sakurai et al 37
Overview
Introduction / Related work Background Main ideas Theoretical analysis Experimental results
![Page 38: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/38.jpg)
SIGMOD 2005 Y. Sakurai et al 38
Theoretical Analysis - Accuracy Effect of smoothing
Effect of geometric lag probing
For sequences with low frequencies, smoothing introduces only small error
BRAIDS will provide no error, if lag probing satisfies the sampling theorem (Nyquist’s)
![Page 39: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/39.jpg)
SIGMOD 2005 Y. Sakurai et al 39
Effect of geometric lag probing Informally, BRAIDS will provide no error, if lag
probing satisfies the sampling theorem (Nyquist’s) Formally: Theorem 2
fR: the Nyquist frequency of CCF, fR=min(fx, fy)
fx, fy: the Nyquist frequencies of X and Y
Theoretical Analysis - Accuracy
BRAID will find the lag correlations perfectly, if
Rf
bl
20
details
![Page 40: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/40.jpg)
SIGMOD 2005 Y. Sakurai et al 40
Theoretical Analysis - ComplexityNaive solution
O(n) space
O(n) time per time tick
BRAID O(log n) space
O(1) time for updating sufficient statistics
O(log n) time for interpolating (when output is required)
details
![Page 41: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/41.jpg)
SIGMOD 2005 Y. Sakurai et al 41
Overview
Introduction / Related work Background Main ideas Theoretical analysis Experimental results
![Page 42: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/42.jpg)
SIGMOD 2005 Y. Sakurai et al 42
Experimental results
Setup Intel Xeon 2.8GHz, 1GB memory, Linux Datasets:
Synthetic: Sines, SpikeTrains,
Real: Humidity, Light, Temperature, Kursk, Sunspots Enhanced BRAID, b=16
![Page 43: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/43.jpg)
SIGMOD 2005 Y. Sakurai et al 43
Experimental results
Evaluation Accuracy for CCF Accuracy for the lag estimation Computation time k-way lag correlations
![Page 44: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/44.jpg)
SIGMOD 2005 Y. Sakurai et al 44
Accuracy for CCF (1) Sines
CCF (Cross-Correlation Function)
BRAID perfectly estimates the correlation coefficients
of the sinusoidal wave
![Page 45: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/45.jpg)
SIGMOD 2005 Y. Sakurai et al 45
Accuracy for CCF (2) SpikeTrains
CCF (Cross-Correlation Function)
BRAID closely estimates the correlation coefficients
![Page 46: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/46.jpg)
SIGMOD 2005 Y. Sakurai et al 46
Accuracy for CCF (3) Humidity (Real data)
CCF (Cross-Correlation Function)
BRAID closely estimates the correlation coefficients
![Page 47: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/47.jpg)
SIGMOD 2005 Y. Sakurai et al 47
Accuracy for CCF (4) Light (Real data)
CCF (Cross-Correlation Function)
BRAID closely estimates the correlation coefficients
![Page 48: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/48.jpg)
SIGMOD 2005 Y. Sakurai et al 48
Accuracy for CCF (5) Kursk (Real data)
CCF (Cross-Correlation Function)
BRAID closely estimates the correlation coefficients
![Page 49: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/49.jpg)
SIGMOD 2005 Y. Sakurai et al 49
Accuracy for CCF (6) Sunspots (Real data)
CCF (Cross-Correlation Function)
BRAID closely estimates the correlation coefficients
![Page 50: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/50.jpg)
SIGMOD 2005 Y. Sakurai et al 50
Experimental results
Evaluation Accuracy for CCF Accuracy for the lag estimation Computation time k-way lag correlations
![Page 51: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/51.jpg)
SIGMOD 2005 Y. Sakurai et al 51
Estimation Error of Lag Correlations
Largest relative error is about 1%
Datasets
Lag correlation Estimation
error (%)Naive BRAID
Sines 716 716 0.000
SpikeTrains 2841 2830 0.387
Humidity 3842 3855 0.338
Light 567 570 0.529
Kursk 1463 1472 0.615
Sunspots 1156 1168 1.038
![Page 52: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/52.jpg)
SIGMOD 2005 Y. Sakurai et al 52
Experimental results
Evaluation Accuracy for CCF Accuracy for the lag estimation Computation time k-way lag correlations
![Page 53: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/53.jpg)
SIGMOD 2005 Y. Sakurai et al 53
Computation time
Reduce computation time dramatically Up to 40,000 times faster
![Page 54: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/54.jpg)
SIGMOD 2005 Y. Sakurai et al 54
Experimental results
Evaluation Accuracy for CCF Accuracy for the lag estimation Computation time k-way lag correlations
![Page 55: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/55.jpg)
SIGMOD 2005 Y. Sakurai et al 55
Group Lag Correlations 55 Temperature sequences Two correlated pairs
Estimation of CCF of #16 and #19 Estimation of CCF of #47 and #48
#16 #19 #47 #48
![Page 56: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/56.jpg)
SIGMOD 2005 Y. Sakurai et al 56
Conclusions Automatic lag correlation detection on data
stream 1. ‘Any-time’ 2. Nimble
O(log n) space, O(1) time to update the statistics
3. Fast Up to 40,000 times faster than the naive
implementation
4. Accurate within 1% relative error or less
![Page 57: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/57.jpg)
SIGMOD 2005 Y. Sakurai et al 57
Effect of geometric lag probing Informally, BRAIDS will provide no error, if lag
probing satisfies the sampling theorem (Nyquist’s) Formally: Theorem 2
fR: the Nyquist frequency of CCF, fR=min(fx, fy)
fx, fy: the Nyquist frequencies of X and Y
Theoretical Analysis - Accuracy
BRAID will find the lag correlations perfectly, if
Rf
bl
20
details
![Page 58: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/58.jpg)
SIGMOD 2005 Y. Sakurai et al 58
Effect of Probing
Dataset: Sines Lag correlation with b=1 lR=1024
![Page 59: BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos](https://reader035.vdocuments.mx/reader035/viewer/2022081516/56649ed15503460f94bdf442/html5/thumbnails/59.jpg)
SIGMOD 2005 Y. Sakurai et al 59
Effect of Probing
Dataset: Light Lag correlation with b=1 lR=630