part 1 similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and...

99
Similarity search, pattern discovery and summarization Yasushi Sakurai (Kumamoto University) Yasuko Matsubara (Kumamoto University) Christos Faloutsos (Carnegie Mellon University) http://www.cs.kumamoto-u.ac.jp/ ~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 1 Part 1

Upload: others

Post on 20-Jul-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Similarity search, pattern discovery and

summarizationYasushi Sakurai (Kumamoto University) Yasuko Matsubara (Kumamoto University) Christos Faloutsos (Carnegie Mellon University)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 1

Part1

Page 2: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Outline

•  Motivation •  Similarity Search and Indexing •  Feature extraction •  Linear forecasting •  Streaming pattern discovery •  Automatic mining

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 2

Page 3: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Stream mining • Applications

– Sensor monitoring – Network analysis – Financial and/or business transaction data – Web access and media service logs – Moving object tracking – Industrial manufacturing

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 3

Page 4: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Stream mining • Requirements

– Fast high performance and quick response

– Nimble low memory consumption, single scan

– Accurate good approximation for pattern discovery and feature extraction

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 4

Time

Page 5: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Monitoring data streams •  Correlation coefficient

•  Correlation coefficient and the (Euclidean)

distance

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 5

∑ =−=

n

t t xxx1

2)()(σ)()()()(

1

yxyyxxn

t tt

σσρ

−⋅−=∑ =

∑ =−−=

n

t tt yx1

2)ˆˆ(211ρ )(/)(ˆ xxxx tt σ−=

020406080100

1 3 5 7 9 11

Page 6: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Monitoring data streams •  Correlation monitoring [Zhu+, vldb02]

– DFT coefficients for each basic window – Correlation coefficient of each sliding window

computed from the `sketch’ (DFT coefs)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 6

DFTcoefsDFTcoefs DFTcoefs

DFTcoefsDFTcoefs DFTcoefs

DFTcoefs

DFTcoefs

Sliding window

SequenceX

SequenceY

DennisShasha

Page 7: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Monitoring data streams •  Grid structure (to avoid checking all pairs)

– DFT coefficients yields a vector – High correlation -> closeness in the vector space

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 7

2

Vector VY of sequence Y

ε Correlation coefficients and the Euclidean distance

∑ =−−=

n

t tt yx1

2)ˆˆ(211ρ

Vector VX of sequence X

Page 8: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Monitoring data streams •  Lag correlation [Sakurai+, sigmod05]

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 8

CCF (Cross-Correlation Function)

Page 9: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Monitoring data streams •  Lag correlation [Sakurai+, sigmod05]

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 9

correlated with lag l=1300

CCF (Cross-Correlation Function)

l=1300

Page 10: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Lag correlation •  Definition of ‘score’, absolute value of R(l)

•  Lag correlation – Given a threshold γ , – A local maximum – The earliest such maximum, if more maxima exist

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 10

)()( lRlscore =∑∑

∑−

=+=

+= −

−−

−−=

ln

t tn

lt t

n

lt ltt

yyxx

yyxxlR

12

12

1

)()(

))(()(

γ>)(lscore

Page 11: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Lag correlation •  Why not naïve?

– Compute correlation coefficient for each lag l = {0, 1, 2, 3, …, n/2}

•  But – O(n) space – O(n2) time – or O(n log n) time w/ FFT

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 11

Lag

Cor

rela

tion

n/20t=n Time

Page 12: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Lag correlation •  BRAID

– Geometric lag probing + smoothing – Use colored windows – Keep track of only a geometric progression of the lag

values: l = {0, 1, 2, 4, 8, …, 2h, …}

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 12

Lag

Cor

rela

tion

Level

h=0 t=n Time

Multi-scale windows

22

23

2223 2120

Page 13: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Lag correlation •  BRAID

– Geometric lag probing + smoothing – Keep track of only a geometric progression of the lag

values: l = {0, 1, 2, 4, 8, …, 2h, …}

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 13

Lag

Cor

rela

tion

Level

h=0 t=n Time

h=0 Y

X

l=0

Page 14: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Lag correlation •  BRAID

– Geometric lag probing + smoothing – Keep track of only a geometric progression of the lag

values: l = {0, 1, 2, 4, 8, …, 2h, …}

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 14

Lag

Cor

rela

tion

Level

h=0 t=n Time

h=0 Y

X

l=1

Page 15: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Lag correlation •  BRAID

– Geometric lag probing + smoothing – Keep track of only a geometric progression of the lag

values: l = {0, 1, 2, 4, 8, …, 2h, …}

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 15

Lag

Cor

rela

tion

Level

h=1 th=n/2 Time

h=1 Y

X

l=2

Page 16: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Lag correlation •  BRAID

– Geometric lag probing + smoothing – Keep track of only a geometric progression of the lag

values: l = {0, 1, 2, 4, 8, …, 2h, …}

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 16

Lag

Cor

rela

tion

Level

h=2 Time

h=2 Y

X th=n/4

l=4

Page 17: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Lag correlation •  BRAID

– Geometric lag probing + smoothing – Keep track of only a geometric progression of the lag

values: l = {0, 1, 2, 4, 8, …, 2h, …}

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 17

Lag

Cor

rela

tion

Level

h=3 Time

h=3 Y

X th=n/8

l=8

Page 18: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Lag correlation •  BRAID

– Geometric lag probing + smoothing – Keep track of only a geometric progression of the lag

values: l = {0, 1, 2, 4, 8, …, 2h, …} – Use a cubic spline to interpolate

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 18

Lag

Cor

rela

tion

Level

h=0 t=n Time

Page 19: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Lag correlation •  Why not naïve?

– Compute correlation coefficient for each lag l = {0, 1, 2, 3, …, n/2}

•  But – O(n) space – O(n2) time – or O(n log n) time w/ FFT

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 19

Lag

Cor

rela

tion

n/20t=n Time

BRAID •  O(log n) space •  O(1) time

Multi-scale

windows

Page 20: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

BRAID in the real world •  Bridge structural health monitoring

– Structural monitoring using vibration/shock sensors – Keep track of lag correlations for sensor data

streams

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 20

Page 21: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

BRAID in the real world •  Bridge structural health monitoring

– Goal: real-time anomaly detection for disaster prevention

– Several thousands readings (per sec) from several hundreds sensor nodes

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 21

Structural health monitoring Vibration/shock sensor

•  Uses BRAID •  Metropolitan Expressway

(Tokyo, Japan)

Page 22: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

BRAID in the real world •  Bridge structural health monitoring with BRAID

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 22

Metropolitan Expressway (Tokyo, Japan)

Tokyo Gate Bridge (Tokyo, Japan)

Can Tho Bridge (Vietnam)

Page 23: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Feature extraction from streams

•  Find hidden variables from streams [Papadimitriou+, vldb2005]

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 23

water distribution network

normal operation

May have hundreds of measurements, but it is unlikely they are completely unrelated!

major leak

Page 24: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Feature extraction from streams

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 24

Phase 1 Phase 2 Phase 3

: : : : : :

: : : : : :

water distribution network

normal operation major leak

May have hundreds of measurements, but it is unlikely they are completely unrelated!

chlo

rine

con

cent

ratio

ns

sensors near leak

sensors away from leak

hidden variables

Page 25: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Motivation

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 25

actual measurements (n streams)

k hidden variable(s)

We would like to discover a few “hidden (latent) variables” that summarize the key trends

Phase 1

: : : : : :

: : : : : :

chlo

rine

con

cent

ratio

ns

Phase 1

k=1

Page 26: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Motivation

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 26

We would like to discover a few “hidden (latent) variables” that summarize the key trends

chlo

rine

con

cent

ratio

ns

Phase 1 Phase 1 Phase 2 Phase 2

actual measurements (n streams)

k hidden variable(s)

k=2

: : : : : :

: : : : : :

Page 27: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Motivation

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 27

We would like to discover a few “hidden (latent) variables” that summarize the key trends

chlo

rine

con

cent

ratio

ns

Phase 1 Phase 1 Phase 2 Phase 2 Phase 3 Phase 3

actual measurements (n streams)

k hidden variable(s)

k=1

: : : : : :

: : : : : :

Page 28: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

How to capture correlations?

First sensor

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 28

20oC

30oC

Tem

pera

ture

T1

time

Page 29: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

How to capture correlations?

First sensor Second sensor

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 29

20oC

30oC

Tem

pera

ture

T2

time

Page 30: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

20oC 30oC

How to capture correlations?

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 30

20oC

30oC

Temperature T1

Correlations: Let’s take a closer look at the first three value-pairs…

Tem

pera

ture

T2

Page 31: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

20oC 30oC

How to capture correlations?

First three lie (almost) on a line in the space of value-pairs…

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 31

20oC

30oC

Tem

pera

ture

T2

Temperature T1

•  O(n) numbers for the slope, and

•  One number for each value-pair (offset on line)

time=1

time=2

time=3

Page 32: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

How to capture correlations?

Other pairs also follow the same pattern: they lie (approximately) on this line

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 32

20oC 30oC

20oC

30oC

Tem

pera

ture

T2

Temperature T1

Page 33: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Incremental update

For each new point

•  Project onto current line

•  Estimate error

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 33

error

20oC 30oC

20oC

30oC

Tem

pera

ture

T2

Temperature T1 New value

Page 34: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Incremental update

For each new point •  Project onto

current line •  Estimate error •  Rotate line in the

direction of the error and in proportion to its magnitude

è O(n) time http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 34

error

20oC

30oC

20oC 30oC

Tem

pera

ture

T2

Temperature T1 New value

Page 35: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Incremental update

For each new point •  Project onto

current line •  Estimate error •  Rotate line in the

direction of the error and in proportion to its magnitude

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 35

20oC

30oC

20oC 30oC

Tem

pera

ture

T2

Temperature T1

Page 36: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Related work

•  Wavelet over streams [Gilbert+,vldb01] [Guha+,vldb04]

•  Fourier representations [Gilbert+, stoc02] •  KNN [Koudas+, 04] [Korn+, vldb02] •  Histograms [Guha+, stoc01] •  Clustering [Guha+, focs00] [Aggarwal+, vldb03] •  Sketches [Indyk+, vldb00] [Cormode+, J. Algorithms

05] •  … •  …

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 36

Page 37: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Related work

•  Heavy hitters [Cormode+, vldb03] •  Data embedding [Indyk+, focs00] •  Burst detection [Zhu+, kdd03] •  Segmentation [Keogh+, icdm01] •  Multiple scale analysis [Papadimitriou+,

sigmod06] •  Fractal [Korn+, sigmod06] •  Time warping [Sakurai+, icde07]… •  …

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 37

GrahamCormode

Tutorial@PODS’15

DiveshSrivastava

Tutorial@SIGMOD’15

Page 38: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Outline

•  Motivation •  Similarity Search and Indexing •  Feature extraction •  Streaming pattern discovery •  Linear forecasting •  Automatic mining

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 38

Page 39: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Time

Motivation Given: co-evolving time-series

– e.g., MoCap (leg/arm sensors)

“Chicken dance”

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 39

Page 40: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Time

Motivation Given: co-evolving time-series

– e.g., MoCap (leg/arm sensors)

“Chicken dance”

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 40

Q. Any distinct patterns?

Q. If yes, how many? Q. What kind?

Page 41: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Motivation Challenges: co-evolving sequences

– Unknown # of patterns (e.g., beaks) – Different durations

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 41

Time

Page 42: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Motivation Goal: find patterns that agree with human intuition

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 42

Input

Page 43: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Motivation Goal: find patterns that agree with human intuition

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 43

Beaks

Wings

Tailfeathers

ClapsBeaks

Wings

Tailfeathers

Claps

Input

Output

Page 44: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Motivation Goal: find patterns that agree with human intuition

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 44

Beaks

Wings

Tailfeathers

ClapsBeaks

Wings

Tailfeathers

Claps

Input

Output

NOmagicnumbers!

Automatic!

Page 45: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Why: Automatic mining No magic numbers! ... because,

Big data mining: -> we cannot afford human intervention!!

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 45

Manual (use magic) -  sensitive to the parameter tuning -  long tuning steps (hours, days, …)

Automatic (no magic numbers) - no expert tuning required

Page 46: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

How: Automatic miningGoal: fully-automatic modeling •  Given: data X •  Find: a compact description (model M) of X

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 46

Data(X) Idealmodel(M)

Q. How can we find the best model M?

Page 47: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

How: Automatic miningGoal: fully-automatic modeling •  Given: data X •  Find: a compact description (model M) of X

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 47

Data(X) Idealmodel(M)

Q. How can we find the best model M?

Answer: MDL!

Page 48: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CSSolution: MDL (Minimum description length)

Solution: Minimize total encoding cost $ !– Occam’s razor (i.e., law of parsimony) – Fully automatic parameter optimization – No over-fitting

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 48

[Bishop: PR&ML]

M=0 M=1 M=3 M=9

Idealmodel

Page 49: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CSSolution: MDL (Minimum description length)

Solution: Minimize total encoding cost $ !

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 49

[Bishop: PR&ML]

CM=0 CM=1 CM=3 CM=9

(Ideal!)$$$ $$ $$$$$

CostM(M)+Costc(X|M)

Modelcost Codingcost(error)

min( )CostT(X;M)=

Totalcost

CC=$$$$$ CC=$$$ CC=$ CC=0

Page 50: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

AutoPlait: Automatic Mining of Co-evolving

Time SequencesYasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University), Christos Faloutsos (CMU)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 50

[Matsubara+ SIGMOD’14]

Page 51: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Problem definition Goal: find patterns that agree with human intuition

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 51

Beaks

Wings

Tailfeathers

ClapsBeaks

Wings

Tailfeathers

Claps

Input

Output

Page 52: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Problem definition•  Bundle : set of d co-evolving sequences

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 52

X = {x1,..., xn}d ×n

Bundle X (d=4)

given

Page 53: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Problem definition•  Segment: convert X -> m segments, S

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 53

Segment (m=8)

s1 s2 s3 s4 s5 s6 s7 s8

S = {s1,..., sm}hidden

Page 54: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Problem definition•  Regime: segment groups:

: model of regime r

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 54

Θ = {θ1,θ2,...θr,Δr×r}

θrRegimes (r=4)

θ1θ2

wingsbeaks

θ3θ4

hidden

Page 55: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Problem definition•  Segment-membership: assignment

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 55

Segment-membership

(m=8)

F = { f1,..., fm}

F = { 2, 4, 1, 3, 2, 4, 1, 3 }

hidden

Page 56: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Problem definition•  Given: bundle X

•  Find: compact description C of X

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 56

C = {m, r,S,Θ,F}

X = {x1,..., xn}

Page 57: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Problem definition•  Given: bundle X

•  Find: compact description C of X

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 57

C = {m, r,S,Θ,F}

X = {x1,..., xn}

m segments

r regimes Segment-membership

Page 58: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Main ideasGoal: compact description of X

without user intervention!!

Challenges: Q1. How to generate ‘informative’ regimes ? Q2. How to decide # of regimes/segments ?

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 58

C = {m, r,S,Θ,F}

Page 59: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Main ideasGoal: compact description of X

without user intervention!!

Challenges: Q1. How to generate ‘informative’ regimes ? Idea (1): Multi-level chain model Q2. How to decide # of regimes/segments ? Idea (2): Model description cost

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 59

C = {m, r,S,Θ,F}

Page 60: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Idea (1): MLCM: multi-level chain model

Q1. How to generate ‘informative’ regimes ?

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 60

beaks

wings

clapsModel

Sequences Regimes

Page 61: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Idea (1): MLCM: multi-level chain model

Q1. How to generate ‘informative’ regimes ?

Idea (1): Multi-level chain model

– HMM-based probabilistic model – with “across-regime” transitions

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 61

beaks

wings

clapsModel

Sequences Regimes

Page 62: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Idea (1): MLCM: multi-level chain model

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 62

Regimes r=2 Regime 1 (k=3) Regime 2 (k=2)

Θ = {θ1,θ2,...θr,Δr×r}

r regimes (HMMs)

across-regime transition prob.

(θi = {π,A,B})

Single HMM parameters

state 1

a1;11

a1;33

state 3

state 2

a1:22

a1;12

a1;23

a1;31state 1 a2;11

state 2

a2;22a2;12

a2;21

Regime switch

δ12

δ21

δ11

θ1

Θ

θ2

δ22

Regime1 “beaks” Regime2 “wings”

Page 63: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CSIdea (2): model description cost

Q2. How to decide # of regimes/segments ?

Idea (2): Model description cost • Minimize encoding cost • find “optimal” # of segments/regimes

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 63

Page 64: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CSIdea (2): model description cost

Idea: Minimize encoding cost!

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 64

Good compression

Good description

1 2 3 4 5 6 7 8 910

CostMCostCCostT

(# of r, m)

CostM(M) + Costc(X|M)

Model cost Coding cost

min ( )

Page 65: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CSIdea (2): model description cost

Total cost of bundle X, given C

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 65

C = {m, r,S,Θ,F}

Page 66: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CSIdea (2): model description cost

Total cost of bundle X, given C

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 66

C = {m, r,S,Θ,F}

Model description cost of Θ

Coding cost of X given Θ

# of segments/regimes

segment lengths

segment- membership F

duration/dimensions

Page 67: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

AutoPlaitOverview

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 67

Iteration 0 r=1, m=1

Start!

1234567Iteration

X

01234567Iteration

r=1

Page 68: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

AutoPlaitOverview

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 68

Iteration 1 r=2, m=4

f1 = 2

X

θ1θ2

f2 =1 f3 = 2 f4 =1

01234567Iteration

r=1

r=2

Split

Page 69: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

AutoPlaitOverview

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 69

Iteration 1 r=2, m=4

Iteration 2 r=3, m=6

f1 = 2

X

θ1θ2

f2 =1 f3 = 2 f4 =1

θ1θ3θ2

Xf1 = 2 f2 = 3 f3 =1 f4 = 2 f5 = 3 f6 =1

Split

01234567Iteration

r=1

r=2

r=3

Page 70: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

AutoPlaitOverview

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 70

Iteration 1 r=2, m=4

Iteration 2 r=3, m=6

Iteration 4 r=4, m=8

f1 = 2

X

X

θ1θ2

f2 =1 f3 = 2 f4 =1

θ1θ3θ2

X

θ1θ3θ4θ2

f1 = 2 f2 = 3 f3 =1 f4 = 2 f5 = 3 f6 =1

f1 = 2 f2 = 4 f3 = 3 f4 =1 f5 = 2 f6 = 4 f7 = 3 f8 =1Split

1234567Iteration

01234567Iteration

r=1

r=2

r=3

r=4

Page 71: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

AutoPlaitAlgorithms 1.  CutPointSearch

Find good cut-points/segments 2.  RegimeSplit

Estimate good regime parameters Θ 3.  AutoPlait

Search for the best number of regimes (r=2,3,4…)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 71

Inner-most loop

Inner loop

Outer loop

Page 72: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

1. CutPointSearchGiven:

- bundle - parameters of two regimes

Find: cut-points of segment sets S1, S2,

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 72

XΘ = {θ1,θ2,Δ}

X

θ1θ2

X{θ1,θ2,Δ}

{S1,S2} = argmaxS1,S2

P(X | S1,S2,Θ)

S1 = {s2, s4}S2 = {s1, s3}

Inner-most loop

Page 73: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

1. CutPointSearch

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 73

...

...

...

...

...

state 1

state 2

state 3

state 2

state 1

t = 3 t = 4 t = 5 t = 6X

t =1 t = 2 ...

L2;1(3) = {3}

L2;2 (5) = {3}

L2;1(4) = {3} L2;1(5) = {4}

L2;2 (6) = {4}

δ12

L2;2 (4) = {4}

δ12

θ1

θ2

L1;1(1) = φ

L1;3(2) = φ

L1;1(3) = φ

switch??switch??

P(X |Θ)

DPalgorithmtocomputelikelihood:

Page 74: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

1. CutPointSearchTheoretical analysis Scalability

- It takes time (only single scan) - n: length of X - d: dimension of X - k: # of hidden states in regime

Accuracy It guarantees the optimal cut points - (Details in paper)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 74

O(ndk2 )

Page 75: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

2. RegimeSplitGiven:

- bundle

Find: two regimes 1. find cut-points of segment sets: S1, S2 2. estimate parameters of two regimes:

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 75

X

Θ = {θ1,θ2,Δ}

Inner loop

X

θ1θ2

X

Page 76: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

2. RegimeSplitTwo-phase iterative approach -  Phase 1: (CutPointSearch)

-  Split segments into two groups : -  Phase 2: (BaumWelch)

-  Update model parameters:

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 76

S1,S2

Θ = {θ1,θ2,Δ}

X

θ1θ2 {θ1,θ2,Δ}

S1 = {s2, s4}S2 = {s1, s3}

Phase 1

Phase 2

Page 77: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

3. AutoPlaitGiven:

- bundle

Find: r regimes (r=2, 3, 4, … )

- i.e., find full parameter set

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 77

X

C = {m, r,S,Θ,F}

Outer loop

X

Page 78: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

3. AutoPlaitSplit regimes r=2,3,…, as long as cost keeps decreasing - Find appropriate # of regimes

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 78

r=2, m=4

r=3, m=6

r=4, m=8

f1 = 2

X

θ1θ2

f2 =1 f3 = 2 f4 =1

θ1θ3θ2

Xf1 = 2 f2 = 3 f3 =1 f4 = 2 f5 = 3 f6 =1

X

θ1θ3θ4θ2

f1 = 2 f2 = 4 f3 = 3 f4 =1 f5 = 2 f6 = 4 f7 = 3 f8 =1

r =minrCostT (X;m, r,S,Θ,F)

Page 79: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Results•  Mocap data •  WebClick data •  Google Trends

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 79

Page 80: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Q1. Sense-makingMoCap data

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 80

AutoPlait (NO magic numbers)

DynaMMo(Lietal.,KDD’09) pHMM(Wangetal.,SIGMOD’11)

Page 81: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Q1. Sense-makingMoCap data

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 81

AutoPlait (NO magic numbers)

Page 82: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Q2. Accuracy (a) Segmentation (b) Clustering

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 82

AutoPlait needs “no magic numbers”

Target

Page 83: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Q3. ScalabilityWall clock time vs. data size (length) : n

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 83

AutoPlait scales linearly, i.e., O(n)

AutoPlait

DynaMMopHMM

Page 84: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

App. Event discovery (GoogleTrend)

Anomaly detection (flu-related topics,10 years)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 84

Page 85: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

App. Event discovery (GoogleTrend)

Anomaly detection (flu-related topics,10 years)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 85

AutoPlait detects 1 unusual spike in 2009 (i.e., swine flu)

Page 86: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

App. Event discovery (GoogleTrend)

Turning point detection (seasonal sweets topics)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 86

Page 87: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

App. Event discovery (GoogleTrend)

Turning point detection (seasonal sweets topics)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 87

Trend suddenly changed in 2010 (release of android OS “Ginger bread”, “Ice Cream Sandwich”)

Page 88: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

App. Event discovery (GoogleTrend)

Trend discovery (game-related topics)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 88

Page 89: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

App. Event discovery (GoogleTrend)

Trend discovery (game-related topics)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 89

It discovers 3 phases of “game console war” (Xbox&PlayStation/Wii/Mobile social games)

Page 90: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Industrial contribution•  Automobile sensor data

– location, velocity, longitudinal/lateral acceleration

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 90

Page 91: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Code at•  http://www.cs.kumamoto-u.ac.jp/~yasuko/

software.html

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 91

Page 92: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Part 1 – Conclusions

•  Motivation •  Similarity Search and Indexing •  Feature extraction •  Linear forecasting •  Streaming pattern discovery •  Automatic mining

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 92

Page 93: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Part 1 – Conclusions

•  Motivation •  Similarity Search and Indexing

– Euclidean/time-warping – extract features – index (SAM, R-tree)

•  Feature extraction – SVD, ICA, DFT, DWT (multi-scale

windows) http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 93

Page 94: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Part 1 – Conclusions

•  Linear forecasting – AR, RLS

•  Streaming pattern discovery – RLS, “incremental” wavelet transform – Multi-scale windows

•  Automatic mining – MDL

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 94

Page 95: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

References •  Yunyue Zhu, Dennis Shasha ``StatStream: Statistical Monitoring of

Thousands of Data Streams in Real Time'‘ VLDB, August, 2002. pp. 358-369.

•  Spiros Papadimitriou, Jimeng Sun, Christos Faloutsos. Streaming Pattern Discovery in Multiple Time-Series. VLDB 2005.

•  Yasushi Sakurai, Spiros Papadimitriou, Christos Faloutsos. BRAID: Stream Mining through Group Lag Correlations. SIGMOD 2005.

•  Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, Martin Strauss. Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries. VLDB 2001.

•  Sudipto Guha, Nick Koudas, Amit Marathe, Divesh Srivastava. Merging the Results of Approximate Match Operations. VLDB 2004.

•  Anna C. Gilbert, Sudipto Guha, Piotr Indyk, S. Muthukrishnan, Martin Strauss. Near-optimal sparse fourier representations via sampling. STOC 2002.

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 95

Page 96: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

References•  Nick Koudas, Beng Chin Ooi, Kian-Lee Tan, Rui Zhang.

Approximate NN queries on Streams with Guaranteed Error/performance Bounds. VLDB 2004.

•  Flip Korn, S. Muthukrishnan, Divesh Srivastava. Reverse Nearest Neighbor Aggregates Over Data Streams. VLDB 2002.

•  Sudipto Guha, Nick Koudas, Kyuseok Shim. Data-streams and histograms. STOC 2001.

•  Sudipto Guha, Nina Mishra, Rajeev Motwani, Liadan O'Callaghan. Clustering Data Streams. FOCS 2000.

•  Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu. A Framework for Clustering Evolving Data Streams. VLDB 2003.

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 96

Page 97: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

References•  Piotr Indyk, Nick Koudas, S. Muthukrishnan. Identifying

Representative Trends in Massive Time Series Data Sets Using Sketches. VLDB 2000.

•  Graham Cormode, S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55 (1), 2005.

•  Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava. Finding Hierarchical Heavy Hitters in Data Streams. VLDB 2003.

•  Piotr Indyk. Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation. FOCS 2000.

•  Yunyue Zhu, Dennis Shasha. Efficient elastic burst detection in data streams. KDD 2003.

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 97

Page 98: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

References•  Eamonn J. Keogh, Selina Chu, David M. Hart, Michael J. Pazzani.

An Online Algorithm for Segmenting Time Series. ICDM 2001. •  Spiros Papadimitriou, Philip S. Yu. Optimal multi-scale patterns in

time series streams. SIGMOD 2006. •  Flip Korn, S. Muthukrishnan, Yihua Wu. Modeling skew in data

streams. SIGMOD 2006. •  Yasushi Sakurai, Christos Faloutsos, Masashi Yamamuro. Stream

Monitoring under the Time Warping Distance. ICDE 2007.

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 98

Page 99: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1

Kumamoto U CMU CS

Similarity search, pattern discovery and

summarizationYasushi Sakurai (Kumamoto University) Yasuko Matsubara (Kumamoto University) Christos Faloutsos (Carnegie Mellon University)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 99

Part1