tadpole_nurjahan begum

Post on 23-Jan-2017

128 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Nurjahan Begum, Liudmila Ulanova, Jun Wang1 and Eamonn Keogh University of California, Riverside UT Dallas1

Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy

Presented By: Nurjahan Begum

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Pruning Strategies

• Experimental Results • Case Studies • Conclusions

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Pruning Strategies

• Experimental Results • Case Studies • Conclusions

Motivation of DTW Clustering

#kanyewest

#Michael

#MichaelJackson

#taylorswift0 40 80 120

hours

Motivation of DTW Clustering

#kanyewest

#Michael

#MichaelJackson

#taylorswift0 40 80 120

hours

Synonym Discovery ?

Motivation of DTW Clustering

#kanyewest

#Michael

#MichaelJackson

#taylorswift0 40 80 120

hours

Synonym Discovery ?

Association Discovery ?

“I’mma let you finish”

Two Questions…

0 200 400 600 800 1000 1200

Query Q

Black-Faced

leafhopper

Beet leafhopper

How do we define similar? • We need to be invariant to noise, amplitude, linear drift, scaling, warping… • Dozens of claimed measures, many with dubious empirical work (cherry picking datasets, crippling rival methods, training on the test data….)

Nothing significantly beats a 40-year old technique called Dynamic Time Warping (DTW).

Comparison Between DTW and ED

Bos taurus

Hyperoodon ampullatus

Talpa europaea

Bos taurus

Hyperoodon ampullatus

Talpa europaea

Cetartiodactyla

DTW ED

Why is DTW Clustering Hard? Observation 1: The convergence of DTW and Euclidean distance results for increasing data sizes.

Observation 2: The increasing effectiveness of lower-bounding pruning for increasing data sizes.

Neither of these two observations help!

0 1000 2000

0.01

0.03

0.05

0.07

1-N

N er

ror r

ate

Size of training set

Euclidean

DTW

0 1000 20000.6

0.7

0.8

0.9

Dataset Size

Rand

Inde

x

DTW

Euclidean

Why Existing Work is not the Answer?

Scalability Issue: DTW is not a metric, therefore very difficult to index Quality Issue: Need clustering algorithm which is insensitive to outliers

1

2

3

4

5

6

7

8

9 10

11

12

13

1

2

3

4

5

6

7

8

9 10

11

12

13

Mislabeled

by k-means

Outlier

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Pruning Strategies

• Experimental Results • Case Studies • Conclusions

Density Peaks (DP) Algorithm • Why?

Parameter-lite Can handle arbitrary shape clusters Insensitive to noise/outliers

• 3 Steps

Density Calculation NN within Higher Density List Calculation Cluster Assignment

Density Peaks (DP) Algorithm Density

1 2 3

4

5

6 8

7

9 10

11 12 13 1

2

3

4

5

6

7

8

9

10

11

12

13

4

3

6

4

5

3

1

3

1

1

2

2

2

ρ

1 dc

Density Peaks (DP) Algorithm Nearest NN from High Density List

1 2 3

4

5

6 8

7

9 10

11 12 13 1

2

3

4

5

6

7

8

9

10

11

12

13

4

3

6

4

5

3

1

3

1

1

2

2

2

ρ Elements with higher density

4.2 6

1 dc

3 5

Density Peaks (DP) Algorithm Cluster Assignment

1 2 3

4

5

6 8

7

9 10

11 12 13 1

2

3

4

5

6

7

8

9

10

11

12

13

4

3

6

4

5

3

1

3

1

1

2

2

2

ρ Elements with higher density

4.2 6

1 dc

3 5

Item 1’s cluster label = item 3’s cluster label

Plot of values of step 1 (density[X]) and step 2 (NN distance[Y])

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Pruning Strategies

• Experimental Results • Case Studies • Conclusions

TADPole

j

LBMatrix(i,j)

Dij

UBMatrix(i,j)

LBMatrix(i,j) Dij

UBMatrix(i,j)

dc

LBMatrix(i,j) Dij

UBMatrix(i,j)

B)

C)

D)

i j

i

i

j

j

i Dij = 0 A)

Pruning During Local Density Computation

Calculate distance!

TADPole Pruning During NN Distance Calculation From Higher Density List

LBMatrix(i,j1) D1

UBMatrix(i,j1)

D2 UBMatrix(i,j2)

D3

UBMatrix(i,j3)

A)

B)

C)

i j1

i

i

j2

j3

D4 UBMatrix(i,j4)

i j4

D)

LBMatrix(i,j2)

LBMatrix(i,j4)

LBMatrix(i,j3)

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Pruning Strategies

• Experimental Results • Case Studies • Conclusions

How Effective is TADPole’s Pruning? D

ista

nce

Cal

cula

tio

ns

0 3500

1

3

5

7 x 10 6

TADPole

Number of objects

Absolute

Number

0 3500 0

100

Number of objects

Brute force

TADPole

Percentage

DP: 9 Hours TADPole: 9 minutes

Distance Computation Ordering: Anytime TADPole

Distance Computation Percentage 100%

0.4

1

0

Ran

d

Ind

ex Euclidean

Distance

Oracle

Order TADPole

Order

0 10%

0.4

1

Oracle Order

Random Order

TADPole Order

Random

Order

Ran

d I

nd

ex

Distance Computation Percentage

Zoom-In of Above Figure

This reflects the 90%

of DTW calculations

that were admissibly

pruned

This reflects the 10%

of DTW calculations

that were calculated

in anytime ordering

10%

How ‘good’ are TADPole Clusters?

Dataset TADPoleDTW

(TADPoleED)

k-means

DTWversion

Hierarchical

DTWversion

DBSCAN

DTWversion

Spectral

DTWversion

CBF 1 (0.66) 0.78 0.73 0.77 0.76

FacesUCR 0.92 (0.86) 0.87 0.85 0.77 0.94

MedicalImages 0.66 (0.67) 0.67 0.62 0.65 0.69

Symbols 0.98 (0.81) 0.93 0.78 0.91 0.95

uWaveGesture_Z 0.86 (0.84) 0.85 0.83 0.8 0.86

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Prunning Strategies

• Experimental Results • Case Studies • Conclusions

Case Study 1 Electromagnetic Articulograph

0 150

Y

Z

Y

Z

1 2 3 4 5 6 7

0.84

0.92

1

Distance Computation Percentage

Ran

d I

nd

ex

Euclidean Distance

Oracle Order

Random Order

TADPole Order

Pruning: 94%

Case Study 2 Pulsus Dataset

Suspected Pulsus

Severe Pulsus

Healthy

Oximeter

Vein Artery

Photo Detector

LED

0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60

Patient 639 Patient 523 Patient 618 Patient 2975918

0 10 20 30 40 50 60 0 10 20 30 40 50 60

Normalized Respiration Rate Normalized Heart Rate

Po

wer

Sp

ectr

al

Den

sity

Frequency

A) B)

C) D) E) F)

200 600 1000 1400 1800 200 600 1000 1400 1800

Non-Severe Pulsus Severe Pulsus

PP

G

Pruning: 88%

Talk Overview • Motivation of Dynamic Time Warping (DTW) Clustering

• Density Peaks (DP) Algorithm • TADPole: Our Proposed Algorithm Novel Prunning Strategies

• Experimental Results • Case Studies • Conclusions

Conclusions • Proposed a robust DTW clustering algorithm

TADPole Exploit both upper and lower bounds Compute the clustering in an anytime fashion

• Demonstrated the utility of our algorithm on diverse domains Electromagnetic Articulograph Pulsus Dataset

Thanks to NSF!

top related