time series filtering
DESCRIPTION
Time Series Filtering. Matches Q11. Time Series. 1. 5. 9. 2. 6. 10. Given a Time Series T , a set of Candidates C and a distance threshold r , find all subsequences in T that are within r distance to any of the candidates in C. 11. 3. 7. 12. 4. 8. Candidates. - PowerPoint PPT PresentationTRANSCRIPT
2
1
4
3 7
6
5 9
8
10
11
12Candidates
Given a Time Series T, a set of Candidates C and a distance threshold r, find all subsequences in T that are within r distance to any of the candidates in C.
Matches Q11
Time Series
Time Series Filtering
2
1
4
3 7
6
5 9
8
10
11
12Queries
Matches Q11
Database
Database
Query (template)
2
1
4
3
5
7
6
9
8
10
Database
Best match
Filtering vs. Querying
Euclidean Distance Metric
Given two time series Q = q1…qn and C = c1…cn ,
their Euclidean distance is defined as:
n
iii cqCQD
1
2,
0 10 20 30 40 50 60 70 80 90 100
Q
C
Early Abandon
During the computation, if current sum of the squared differences between each pair of corresponding data points exceeds r 2, we can safely stop the calculation.
0 10 20 30 40 50 60 70 80 90 100
calculation abandoned at this point
Q
C
2
1
4
3 7
6
5 9
8
10
11
12Candidates
Individually compare each candidate sequence to the query using the early abandoning algorithm.
Time Series
Classic Approach
Wedge
C2
C1
U
L
U
L
Q
W
W
Having candidate sequences C1, .. , Ck , we can form two new sequences U and L : Ui = max(C1i , .. , Cki ) Li = min(C1i , .. , Cki )
They form the smallest possible bounding envelope that encloses sequences C1, .. ,Ck .
We call the combination of U and L a wedge, and denote a wedge as W. W = {U, L}
A lower bounding measure between an arbitrary query Q and the entire set of candidate sequences contained in a wedge W:
n
iiiii
iiii
otherwise
LqifLq
UqifUq
WQKeoghLB1
2
2
0
)(
)(
),(_
Generalized Wedge
• Use W(1,2) to denote that a wedge is built from sequences C1 and C2 .
• Wedges can be hierarchally nested. For example, W((1,2),3) consists of W(1,2) and C3 .C1 (or W1 ) C2 (or W2 ) C3 (or W3 )
W(1, 2)
W((1, 2), 3)
2
1
4
3 7
6
5 9
8
10
11
12Candidates
•Compare the query to the wedge using LB_Keogh
•If the LB_Keogh function early abandons, we are done
•Otherwise individually compare each candidate sequences to the query using the early abandoning algorithm
Time Series
H-Merge
Hierarchal Clustering
C1 (or W1)
C4 (or W4)
C2 (or W2)
C5 (or W5)
C3 (or W3)
W3
W2
W5
W1
W4
W3
W(2,5)
W1
W4
W3
W(2,5)
W(1,4)
W((2,5),3)
W(1,4)
W(((2,5),3), (1,4))
K = 5 K = 4 K = 3 K = 2 K = 1
Which wedge set to choose ?
Which Wedge Set to Choose ?
• Test all k wedge sets on a representative sample of data
• Choose the wedge set which performs the best
Upper Bound on H-Merge
• Wedge based approach seems to be efficient when comparing a set of time series to a large batch dataset.
• But, what about streaming time series ?– Streaming algorithms are limited by their worst
case.– Being efficient on average does not help.
• Worst caseC1 (or W1 ) C2 (or W2 ) C3 (or W3 )
W(1, 2)
W((1, 2), 3)
Subsequence
W3
W2
W5
W1
W4
W3
W(2,5)
W1
W4
W3
W(2,5)
W(1,4)
W((2,5),3)
W(1,4)
W(((2,5),3), (1,4))
K = 5 K = 4 K = 3 K = 2 K = 1
If dist(W((2,5),3), W(1,4)) >= 2 r
failscannot fail on both wedges
Subsequence
>= 2r
< r
?
W((2,5),3)
W(1,4)
Triangle Inequality
Experimental Setup
• Datasets– ECG Dataset– Stock Dataset– Audio Dataset
• We measure the number of computational steps used by the following methods:– Brute force– Brute force with early abandoning (classic)– Our approach (H-Merge)– Our approach with random wedge set (H-Merge-
R)
Experimental Results: ECG Dataset
• Batch time series– 650,000 data points
(half an hour’s ECG signals)
• Candidate set– 200 time series of
length 40
• r = 0.5
Algorithm Number of Steps
brute force 5,199,688,000
classic 210,190,006
H-Merge 8,853,008
H-Merge-R 29,480,264
0
1
2
3
4
5
6x 10
9
Algorithms
Num
ber
of S
teps
brute force
classicH-Merge
H-Merge-R
Experimental Results: Stock Dataset
• Batch time series– 2,119,415 data points
• Candidate set– 337 time series with
length 128
• r = 4.3
Algorithm Number of Steps
brute force 91,417,607,168
classic 13,028,000,000
H-Merge 3,204,100,000
H-Merge-R 10,064,000,000
0
1
2
3
4
5
6
7
8
9
10x 10
10
Algorithms
Num
ber
of S
teps
brute force
classic
H-Merge H-Merge-R
Experimental Results: Audio Dataset
• Batch time series– 46,143,488 data
points (one hour’s sound)
• Candidate set– 68 time series with
length 101
• r = 4.14• Sliding window
– 11,025 (1 second)
• Step – 5,512 (0.5 second)
Algorithm Number of Steps
brute force 57,485,160
classic 1,844,997
H-Merge 1,144,778
H-Merge-R 2,655,816
0
1
2
3
4
5
6x 10
7
Algorithms
Num
ber
of S
teps
brute force
classic H-Merge H-Merge-R
Experimental Results: Sorting
• Wedge– with length 1,000
• Random walk time series– with length 65,536
r = 0.5 r = 1 r = 2 r = 3
Sorted 95,025 151,723 345,226 778,367
Unsorted 1,906,244 2,174,994 2,699,885 3,286,213
r = 0.5 r = 1 r = 2 r = 30
0.5
1
1.5
2
2.5
3
3.5x 10
6
Num
ber
of S
teps
sorted
unsorted