time%seriesipeople.dsv.su.se/~panagiotis/dami2014/timeseries1.pdf · 2014-12-07 · syllabus% nov4...

Post on 04-Apr-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Time  Series  I  

1  

Syllabus  Nov  4   Introduc8on  to  data  mining  

Nov  5   Associa8on  Rules  

Nov  10,  14   Clustering  and  Data  Representa8on  

Nov  17   Exercise  session  1  (Homework  1  due)  

Nov  19   Classifica8on  

Nov  24,  26   Similarity  Matching  and  Model  Evalua8on  

Dec  1   Exercise  session  2  (Homework  2  due)  

Dec  3   Combining  Models  

Dec  8,  10   Time  Series  Analysis  

Dec  15   Exercise  session  3  (Homework  3  due)  

Dec  17   Ranking  

Jan  13   Review  

Jan  14   EXAM  

Feb  23   Re-­‐EXAM  

Why  deal  with  sequen8al  data?  •  Because  all  data  is  sequen8al  J    •  All  data  items  arrive  in  the  data  store  in  some  order    •  Examples  

–  transac8on  data  –  documents  and  words    

•  In  some  (or  many)  cases  the  order  does  not  maXer    •  In  many  cases  the  order  is  of  interest  

3  

Time-­‐series  data:  example  

         Financial  8me  series  4  

Ques8ons  

•  What  is  8me  series?  

•  How  do  we  compare  8me  series  data?  

•  What  is  the  structure  of  8me  series  data?  

•  Can  we  represent  this  structure  compactly  and  accurately?  

5  

Time Series •  A sequence of observations:

–  X = (x1, x2, x3, x4, …, xn) •  Each xi is a real number

–  e.g., (2.0, 2.4, 4.8, 5.6, 6.3, 5.6, 4.4, 4.5, 5.8, 7.5)

8me  axis  

value  axis  

Time  Series  Databases  •  A  <me  series  is  an  ordered  set  of  real  numbers,  

represen8ng  the  measurements  of  a  real  variable  at  equal  8me  intervals    

– Stock  prices  – Volume  of  sales  over  8me  – Daily  temperature  readings  – ECG  data    

•  A  <me  series  database  is  a  large  collec8on  of  8me  series  

7  

•  Given  two  8me  series                  X  =  (x1,  x2,  …,  xn)                            Y  =  (y1,  y2,  …,  yn)  

 •  Define  and  compute  D  (X,  Y)    •  Or  be@er…  

Time  Series  Similarity  

database

query X

D (X, Y) 1-NN

Time  Series  Similarity  Search  •  Given  a  8me  series  database  and  a  query  X  •  Find  the  best  match  of  X  in  the  database              

       •  Why  is  that  useful?  

Examples  

•  Find   companies   with   similar   stock   prices   over   a  

8me  interval  

•  Find  products  with  similar  sell  cycles  

•  Cluster  users  with  similar  credit  card  u8liza8on  

•  Find  similar  subsequences  in  DNA  sequences  

•  Find  scenes  in  video  streams    

10  

Types  of  queries  

•  whole  match  vs  subsequence  match    •  range  query  vs  nearest  neighbor  query  

11  

day

$price

1 365

day

$price

1 365

day

$price

1 365

distance function: by expert

(e.g., Euclidean distance)

12  

Problems  

•  Define  the  similarity  (or  distance)  func8on  •  Find  an  efficient  algorithm  to  retrieve  similar  8me    series  from  a  database  –  (Faster  than  sequen8al  scan)  

The Similarity function depends on the Application

13  

Metric  Distances  

•  What  proper8es  should    a  similarity  distance  have  to  allow  (easy)  indexing?  

–  D(A,B)  =  D(B,A)      Symmetry    –   D(A,A)  =  0      Constancy  of  Self-­‐Similarity  –   D(A,B)  >=  0        Posi4vity  –   D(A,B)  ≤  D(A,C)  +  D(B,C)    Triangle  Inequality    

•  Some8mes   the   distance   func8on   that   best   fits   an  applica8on  is  not  a  metric  

•  Then  indexing  becomes  interes8ng  and  challenging  14  

Euclidean  Distance  

15

•  Each  8me  series:  a  point  in  the  n-­‐dim  space  

•  Euclidean  distance  – pair-­‐wise  point  distance  

v1 v2

L2 = (xi − yi )2

i=1

n

 X  =  x1,  x2,  …,  xn                  

   Y  =  y1,  y2,  …,  yn  

Euclidean  model  Query Q

n datapoints

Database

n datapoints 16  

Query Q

n datapoints

D Q,X( ) ≡ qi − xi( ) 2i=1

n∑

S

Q

Euclidean Distance between two time series Q = {q1, q2, …, qn} and X = {x1, x2, …, xn}

Database

n datapoints 17  

Euclidean  model  

Query Q

n datapoints

D Q,X( ) ≡ qi − xi( ) 2i=1

n∑

S

Q

Euclidean Distance between two time series Q = {q1, q2, …, qn} and X = {x1, x2, …, xn}

Distance

0.98

0.07

0.21

0.43

Rank

4

1

2

3

Database

n datapoints 18  

Euclidean  model  

 

•  Easy  to  compute:  O(n)  •  Allows  scalable  solu8ons  to  other  problems,  such  as  –  indexing  – clustering  – etc...    

Advantages  

19  

 

•  Query  and  target  lengths  should  be  equal!  

•  Cannot  tolerate  noise:  –  Time  shiks  –  Sequences  out  of  phase  –  Scaling  in  the  y-­‐axis  

Disadvantages  

20  

21  

Limita8ons  of  Euclidean  Distance  

Euclidean  Distance  Sequences  are  aligned  “one  to  one”.  

“Warped”  Time  Axis  Nonlinear  alignments  are  possible.  

D Q,X( ) ≡ qi − xi( ) 2i=1

n∑

Q  

Q  

C  

C  

22  

DTW:  Dynamic  8me  warping  (1/2)    

•  Each  cell  c  =  (i,  j)  is  a  pair  of  indices  whose  corresponding  values  will  be  computed,  (xi–qj)2,  and  included  in  

the  sum  for  the  distance.  

•  Euclidean  path:  

–  i  =  j  always.  

–  Ignores  off-­‐diagonal  cells.  

X  

Q  

(x2–q2)2  +  (x1–q1)2    (x1–q1)2  

23  

(i, j)

DTW:  Dynamic  8me  warping  (2/2)  

•  DTW  allows  any  path.  •  Examine  all  paths:  

•  Standard  dynamic  programming  to  fill  in  the  table.  

•  The  top-­‐right  cell  contains  final  result.  

(i, j) (i-1, j)

(i-1, j-1) (i, j-1)

Shrink  X  /  stretch  Q

Stretch  X  /  shrink  Q

X  

Q  

a  

b  

24  

Computa8on  

Ddtw (Q,X) = f (N,M )

f (i, j) = qi − x j +minf (i, j −1)f (i−1, j)f (i−1, j −1)

"

#$

%$

q-­‐stretch  no  stretch  

x-­‐stretch  

•  DTW  is  computed  by  dynamic  programming •  Given  two  sequences  

– Q  =  {q1,  q2,  …,  qN}  – X  =    {x1,  x2,  …,  xM}            

•  Warping  path  W:    –  set  of  grid  cells  in  the  8me  warping  matrix  

•  DTW  finds  the  op8mum  warping  path  W:  –  the  path  with  the  smallest  matching  score      

Op8mum  warping  path  W  (the  best  alignment)   Proper<es  of  a  DTW  legal  path  

I.   Boundary  condi<ons  

 W1=(1,1)  and  WK=(n,m)  

II.   Con<nuity    Given  Wk  =  (a,  b),  then      Wk-­‐1  =  (c,  d),  where  a-­‐c  ≤  1,  b-­‐d  ≤  1  

III.   Monotonicity    Given  Wk  =  (a,  b),  then        Wk-­‐1  =  (c,  d),  where  a-­‐c  ≥  0,  b-­‐d  ≥  0  

Proper8es  of  DTW  

X  

Y  

25  

Proper8es  of  DTW  

I.   Boundary  condi<ons  

 W1=(1,1)  and  WK=(n,m)  

II.   Con<nuity    Given  Wk  =  (a,  b),  then      Wk-­‐1  =  (c,  d),  where  a-­‐c  ≤  1,  b-­‐d  ≤  1  

III.   Monotonicity    Given  Wk  =  (a,  b),  then        Wk-­‐1  =  (c,  d),  where  a-­‐c  ≥  0,  b-­‐d  ≥  0   26  

•  Paths   start   at   the   boXom   lek   cell  and  end  at  the  top  right  cell  

•  There   is  always  a  point  of   the  path  in   each   row   and   column   of   the  matrix  

•  Paths   go   always   from   lek   to   right  and  from  boXom  to  top  

 

•  Query  and  target  lengths  may  not  be  of  equal  length  J  

•  Can  tolerate  noise:  –  8me  shiks  –  sequences  out  of  phase  –  scaling  in  the  y-­‐axis  

Advantages  

27  

 

•  Computa8onal  complexity:  O(nm)  

 •  May  not  be  able  to  handle  some  types  of  noise...  

•  DTW  is  not  metric  (triangle  inequality  does  not  hold)  

Disadvantages  

28  

29  

Sakoe-­‐Chiba  Band   Itakura  Parallelogram  

r  =    

Global  Constraints  n  Slightly  speed  up  the  calcula8ons  and  prevent  pathological  warpings  n  A  global  constraint  limits  the  indices  of  the  warping  path    

 wk  =  (i,  j)k  such  that  j-­‐r  ≤  i  ≤  j+r  n  Where  r  is  a  term  defining  allowed  range  of    warping  for  a  given  point  in  a  

sequence  

Complexity  of  DTW  

•  Basic  implementa8on  =  O(n2)  where  n  is  the  length  of  the  sequences  –  will  have  to  solve  the  problem  for  each  (i,    j)  pair  

•  If  warping  window  is  specified,  then  O(nr)  –  only  solve  for  the  (i,  j)  pairs  where  |  i  –  j  |  <=  r  

   

30  

Longest  Common  Subsequence  Measures      

(Allowing  for  Gaps  in  Sequences)  

Gap skipped

31  

Longest  Common  Subsequence  (LCSS)  

ignore majority of noise

match

match

Advantages of LCSS:

A. Outlying values not matched

B. Distance/Similarity distorted less

Disadvantages of DTW:

A. All points are matched

B. Outliers can distort distance

C. One-to-many mapping

LCSS is more resilient to noise than DTW.

32  

Longest  Common  Subsequence  Similar dynamic programming solution as DTW, but now we measure similarity not distance.

Can also be expressed as distance

33  

Similarity  Retrieval  

•  Range  Query  –  Find  all  8me  series  X  where    

•  Nearest  Neighbor  query  –  Find  all  the  k  most  similar  8me  series  to  Q  

•  A  method  to  answer  the  above  queries:    –  Linear  scan    

•  A  beXer  approach    – GEMINI  [next  8me]  

D Q,X( ) ≤ ε

34  

35  

Lower  Bounding  –  NN  search    

Intui<on  ü   Try  to  use  a  cheap  lower  bounding  calcula8on  as  oken  as  possible  ü   Do  the  expensive,  full  calcula8ons  when  absolutely  necessary  

We  can  speed  up  similarity  search  by  using  a  lower  bounding  func8on    §   D:  distance  measure  

§   LB:  lower  bounding  func8on  s.t.:          LB(Q,  X)  ≤  D(Q,  X)    

Ø   Set  best  =  ∞  Ø   For  each  Xi:  

à if  LB(Xi,  Q)  <  best  if  D(Xi,  Q)  <  best                  best  =  D(Xi,  Q)    

1-NN Search Using LB

We  assume  a  database  of  8me  series:  DB  =  {X1,  X2,  …,  XN}  

36  

Lower  Bounding  –  NN  search    

Intui<on  ü   Try  to  use  a  cheap  lower  bounding  calcula8on  as  oken  as  possible  ü   Do  the  expensive,  full  calcula8ons  when  absolutely  necessary  

We  can  speed  up  similarity  search  by  using  a  lower  bounding  func8on    §   D:  distance  measure  

§   LB:  lower  bounding  func8on  s.t.:          LB(Q,  X)  ≤  D(Q,  X)    

Range Query Using LB For  each  Xi:  

à if  LB(Xi,  Q)  ≤  ε  if  D(Xi,  Q)  <  ε                  report  Xi  

We  assume  a  database  of  8me  series:  DB  =  {X1,  X2,  …,  XN}  

Problems  •  How  to  define  Lower  bounds  for  different  distance  measures?  

•  How  to  extract  the  features?  How  to  define  the  feature  space?  –  Fourier  transform  – Wavelets  transform  – Averages  of  segments  (Histograms  or  APCA)  –  Chebyshev  polynomials  –  ....  your  favorite  curve  approxima8on...    

37  

38  

Some  Lower  Bounds  on  DTW  

Each  8me  series  is  represented  by  4  features:        <First,  Last,  Min,  Max>  

LB_Kim  =  maximum  squared  difference  of  the    corresponding  features    

LB_Kim  

max(Q)  

min(Q)  

LB_Yi  

LB_Yi  =  squared  differences  of  the  points  of  X  that  fall  above  max(Q)  or  below  min(Q)  

X  

Q  

X  

Q  

39  

LB_Keogh  [Keogh  2004]  

L  

U  

Q  

U  

L  Q  

X  

 Q  

X  

 Q  

Sakoe-­‐Chiba  Band  

Itakura  Parallelogram  

Ui  =  max(qi-­‐r  :  qi+r)  Li    =  min(qi-­‐r  :  qi+r)  

40  

X U

L Q

X U

L Q

X

Q

X

Q

Sakoe-Chiba Band

Itakura Parallelogram

LB_Keogh(Q,X)=

(xi −Ui )2 if xi >Ui

(xi − Li )2 if xi <Li

0 otherwise

"

#$$

%$$

i=1

n

∑LB_Keogh

LB_Keogh  

LB_Keogh(Q,X) ≤ DTW (Q,X)

41  

LB_Keogh Sakoe-Chiba

LB_Keogh Itakura

LB_Yi

LB_Kim

…propor8onal  to  the  length  of  gray  lines  used  in  the  illustra8ons    

Tightness  of  LB  

nceDistaWarpTimeDynamicTruenceDistaWarpTimeDynamicofEstimateBoundLowerT =

0  ≤  T  ≤  1  The  larger  the  

beXer  

Lower  Bounding  

distance  Q  

we  want  to  find  the  1-­‐NN  to  our  query  data  series,  Q  

Lower  Bounding  

distance  Q   true  S1  

we  compute  the  distance  to  the  first  data  series  in  our  dataset,  D(S1,Q)  

this  becomes  the  best  so  far  (BSF)  

Lower  Bounding  

distance  Q   true  S1  

BSF  

LB  S2  

we  compute  the  distance  LB(S2,Q)  and  it  is  greater  than  the  BSF  

we  can  safely  prune  it,  since  D(S2,Q)    LB(S2,Q)  

Lower  Bounding  

distance  Q   true  S1  

BSF  

LB  S2  

we  compute  the  distance  LB(S3,Q)  and  it  is  smaller  than  the  BSF  we  have  to  compute  D(S3,Q)≥  LB(S3,Q),  since  it  may  s8ll  be  

smaller  than  BSF  

LB  S3  

Lower  Bounding  

distance  Q   true  S1  

BSF  

LB  S2  

it  turns  out  that  D(S3,Q)≥  BSF,  so  we  can  safely  prune  S3  

true  S3  

Lower  Bounding  

distance  Q   true  S1  

BSF  

LB  S2  true  S3  

Lower  Bounding  

distance  Q   true  S1  

BSF  

LB  S2  true  S3  

we  compute  the  distance  LB(S4,Q)  and  it  is  smaller  than  the  BSF  we  have  to  compute  D(S4,Q)≥  LB(S4,Q),  since  it  may  s8ll  be  

smaller  than  BSF  

LB  S4  

Lower  Bounding  

distance  Q   true  S1  

BSF  

LB  S2  true  S3  true  S4  

it  turns  out  that  D(S4,Q)<  BSF,  so  S4  becomes  the  new  BSF  

Lower  Bounding  

distance  Q   true  S1  

S1  cannot  be  the  1-­‐NN,  because  S4  is  closer  to  Q  

LB  S2  true  S3  true  S4  

BSF  

51  

How  about  subsequence  matching?  

•  DTW  is  defined  for  full-­‐sequence  matching:  –  All   points  of   the  query   sequence  are  matched   to  all   points  of  the  target  sequence  

•  Subsequence  matching:  –  The   query   is   matched   to   a   part   (subsequence)   of   the   target  sequence

Query  sequence   Data  stream  

X: long sequence

Q: short sequence

What subsequence of X is the best match for Q?

Subsequence Matching

X: long sequence

Q: short sequence

What subsequence of X is the best match for Q … such that the match ends at position j?

position j

J-Position Subsequence Match

X: long sequence

Q: short sequence

X: long sequence

Q: short sequence

position j

J-Position Subsequence Match

X: long sequence

Q: short sequence

Naïve Solution: DTW Examine all possible subsequences

X: long sequence

Q: short sequence

position j

J-Position Subsequence Match

Naïve Solution: DTW Examine all possible subsequences

X: long sequence

Q: short sequence

X: long sequence

Q: short sequence

Naïve Solution: DTW Examine all possible subsequences

X: long sequence

Q: short sequence

position j

J-Position Subsequence Match

Naïve Solution: DTW Examine all possible subsequences

X: long sequence

Q: short sequence

X: long sequence

Q: short sequence

Naïve Solution: DTW Examine all possible subsequences

X: long sequence

Q: short sequence

position j

J-Position Subsequence Match

Too costly!

Naïve Solution: DTW Examine all possible subsequences

X: long sequence

Q: short sequence

X: long sequence

Q: short sequence

Naïve Solution: DTW Examine all possible subsequences

58  

•  Compute  the  8me  warping  matrices  star8ng  from  every  database  frame –  Need  O(n)  matrices,  O(nm)  8me  per  frame

Q

X xtstart xtend

x1

Why  not  ‘naive’?

Capture the optimal subsequence starting

from t = tstart n

m

59  

Key  Idea  •  Star-padding

– Use only a single matrix

(the naïve solution uses n matrices)

–  Prefix Q with ‘*’, that always gives zero distance

–  Instead of Q=(q1 , q2 , …, qm), compute distances with Q’

– O(m) time and space (the naïve requires O(nm))

(*)),,,,('

0

210

=

=

qqqqqQ m…

SPRING: dynamic programming

n  Initialization n  Insert a “dummy” state ‘*’ at the beginning of the query n  ‘*’ matches every value in X with score 0

database sequence X

quer

y Q

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

n  Computation n  Perform dynamic programming computation in a similar

manner as standard DTW

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quer

y Q

SPRING: dynamic programming

(i, j) (i, j) (i-1, j)

(i-1, j-1) (i, j-1)

Q[1:i] is matched with X[s,j]

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quer

y Q

i  

js

n  For each (i, j): n  compute the j-position subsequence match of the first i

items of Q to X[s:j]

SPRING: dynamic programming

n  For each (i, j): n  compute the j-position subsequence match of the first i

items of Q to X[s:j] n  Top row: j-position subsequence match of Q for all j’s n  Final answer: best among j-position matches

n  Look at answers stored at the top row of the table

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quer

y Q

SPRING: dynamic programming

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Subsequence vs. full matching qu

ery

Q

Q

p1 pi pN

q1

qj

qM

n  Assume that the database is one very long sequence n  Concatenate all sequences into one sequence

n  O (|Q| * |X|) n  But can be computed faster by looking at only two

adjacent columns

Computational complexity

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quer

y Q

STWM (Subsequence Time Warping Matrix)

•  Problem of the star-padding: we lose the information about the starting frame of the match

•  After the scan, “which is the optimal subsequence?”

•  Elements of STWM

– Distance value of each subsequence

–  Starting position !!

•  Combination of star-padding and STWM

– Efficiently identify the optimal subsequence in a stream fashion

Up  next…

•  Time  series  summariza8ons  

–  Discrete  Fourier  Transform  (DFT)  

–  Piecewise  Aggregate  Approxima8on  (PAA)  

–  Symbolic ApproXimation (SAX)

•  Streams

–  Z-normalization

–  A fast algorithm for subsequence matching in streams

•  Time series classification [briefly]

–  Lazy learners and Shapelets

top related