time series filtering

17
2 1 4 3 7 6 5 9 8 10 11 12 Candidates Given a Time Series T, a set of Candidates C and a distance threshold r, find all subsequences in T that are within r distance to any of the candidates in C. Matches Q11 Time Serie Time Series Filtering

Upload: brendan-powers

Post on 31-Dec-2015

20 views

Category:

Documents


0 download

DESCRIPTION

Time Series Filtering. Matches Q11. Time Series. 1. 5. 9. 2. 6. 10. Given a Time Series T , a set of Candidates C and a distance threshold r , find all subsequences in T that are within r distance to any of the candidates in C. 11. 3. 7. 12. 4. 8. Candidates. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Time Series Filtering

2

1

4

3 7

6

5 9

8

10

11

12Candidates

Given a Time Series T, a set of Candidates C and a distance threshold r, find all subsequences in T that are within r distance to any of the candidates in C.

Matches Q11

Time Series

Time Series Filtering

Page 2: Time Series Filtering

2

1

4

3 7

6

5 9

8

10

11

12Queries

Matches Q11

Database

Database

Query (template)

2

1

4

3

5

7

6

9

8

10

Database

Best match

Filtering vs. Querying

Page 3: Time Series Filtering

Euclidean Distance Metric

Given two time series Q = q1…qn and C = c1…cn ,

their Euclidean distance is defined as:

n

iii cqCQD

1

2,

0 10 20 30 40 50 60 70 80 90 100

Q

C

Page 4: Time Series Filtering

Early Abandon

During the computation, if current sum of the squared differences between each pair of corresponding data points exceeds r 2, we can safely stop the calculation.

0 10 20 30 40 50 60 70 80 90 100

calculation abandoned at this point

Q

C

Page 5: Time Series Filtering

2

1

4

3 7

6

5 9

8

10

11

12Candidates

Individually compare each candidate sequence to the query using the early abandoning algorithm.

Time Series

Classic Approach

Page 6: Time Series Filtering

Wedge

C2

C1

U

L

U

L

Q

W

W

Having candidate sequences C1, .. , Ck , we can form two new sequences U and L : Ui = max(C1i , .. , Cki ) Li = min(C1i , .. , Cki )

They form the smallest possible bounding envelope that encloses sequences C1, .. ,Ck .

We call the combination of U and L a wedge, and denote a wedge as W. W = {U, L}

A lower bounding measure between an arbitrary query Q and the entire set of candidate sequences contained in a wedge W:

n

iiiii

iiii

otherwise

LqifLq

UqifUq

WQKeoghLB1

2

2

0

)(

)(

),(_

Page 7: Time Series Filtering

Generalized Wedge

• Use W(1,2) to denote that a wedge is built from sequences C1 and C2 .

• Wedges can be hierarchally nested. For example, W((1,2),3) consists of W(1,2) and C3 .C1 (or W1 ) C2 (or W2 ) C3 (or W3 )

W(1, 2)

W((1, 2), 3)

Page 8: Time Series Filtering

2

1

4

3 7

6

5 9

8

10

11

12Candidates

•Compare the query to the wedge using LB_Keogh

•If the LB_Keogh function early abandons, we are done

•Otherwise individually compare each candidate sequences to the query using the early abandoning algorithm

Time Series

H-Merge

Page 9: Time Series Filtering

Hierarchal Clustering

C1 (or W1)

C4 (or W4)

C2 (or W2)

C5 (or W5)

C3 (or W3)

W3

W2

W5

W1

W4

W3

W(2,5)

W1

W4

W3

W(2,5)

W(1,4)

W((2,5),3)

W(1,4)

W(((2,5),3), (1,4))

K = 5 K = 4 K = 3 K = 2 K = 1

Which wedge set to choose ?

Page 10: Time Series Filtering

Which Wedge Set to Choose ?

• Test all k wedge sets on a representative sample of data

• Choose the wedge set which performs the best

Page 11: Time Series Filtering

Upper Bound on H-Merge

• Wedge based approach seems to be efficient when comparing a set of time series to a large batch dataset.

• But, what about streaming time series ?– Streaming algorithms are limited by their worst

case.– Being efficient on average does not help.

• Worst caseC1 (or W1 ) C2 (or W2 ) C3 (or W3 )

W(1, 2)

W((1, 2), 3)

Subsequence

Page 12: Time Series Filtering

W3

W2

W5

W1

W4

W3

W(2,5)

W1

W4

W3

W(2,5)

W(1,4)

W((2,5),3)

W(1,4)

W(((2,5),3), (1,4))

K = 5 K = 4 K = 3 K = 2 K = 1

If dist(W((2,5),3), W(1,4)) >= 2 r

failscannot fail on both wedges

Subsequence

>= 2r

< r

?

W((2,5),3)

W(1,4)

Triangle Inequality

Page 13: Time Series Filtering

Experimental Setup

• Datasets– ECG Dataset– Stock Dataset– Audio Dataset

• We measure the number of computational steps used by the following methods:– Brute force– Brute force with early abandoning (classic)– Our approach (H-Merge)– Our approach with random wedge set (H-Merge-

R)

Page 14: Time Series Filtering

Experimental Results: ECG Dataset

• Batch time series– 650,000 data points

(half an hour’s ECG signals)

• Candidate set– 200 time series of

length 40

• r = 0.5

Algorithm Number of Steps

brute force 5,199,688,000

classic 210,190,006

H-Merge 8,853,008

H-Merge-R 29,480,264

0

1

2

3

4

5

6x 10

9

Algorithms

Num

ber

of S

teps

brute force

classicH-Merge

H-Merge-R

Page 15: Time Series Filtering

Experimental Results: Stock Dataset

• Batch time series– 2,119,415 data points

• Candidate set– 337 time series with

length 128

• r = 4.3

Algorithm Number of Steps

brute force 91,417,607,168

classic 13,028,000,000

H-Merge 3,204,100,000

H-Merge-R 10,064,000,000

0

1

2

3

4

5

6

7

8

9

10x 10

10

Algorithms

Num

ber

of S

teps

brute force

classic

H-Merge H-Merge-R

Page 16: Time Series Filtering

Experimental Results: Audio Dataset

• Batch time series– 46,143,488 data

points (one hour’s sound)

• Candidate set– 68 time series with

length 101

• r = 4.14• Sliding window

– 11,025 (1 second)

• Step – 5,512 (0.5 second)

Algorithm Number of Steps

brute force 57,485,160

classic 1,844,997

H-Merge 1,144,778

H-Merge-R 2,655,816

0

1

2

3

4

5

6x 10

7

Algorithms

Num

ber

of S

teps

brute force

classic H-Merge H-Merge-R

Page 17: Time Series Filtering

Experimental Results: Sorting

• Wedge– with length 1,000

• Random walk time series– with length 65,536

r = 0.5 r = 1 r = 2 r = 3

Sorted 95,025 151,723 345,226 778,367

Unsorted 1,906,244 2,174,994 2,699,885 3,286,213

r = 0.5 r = 1 r = 2 r = 30

0.5

1

1.5

2

2.5

3

3.5x 10

6

Num

ber

of S

teps

sorted

unsorted