university of macau quick-motif: an efficient and scalable framework for exact motif discovery...

13
University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li [email protected] Department of Computer and Information Science University of Macau, Macau

Upload: grant-cannon

Post on 04-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

University of Macau

Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery

Yuhong Li

[email protected]

Department of Computer and Information Science

University of Macau, Macau

Page 2: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

2 University of Macau

■ Most similar subsequence pair in a Time Series

■ Applications A core subroutine for activity discovery, e.g., elder care,

surveillance and sports training. Clustering enumerated motifs is more meaningful than

clustering all the subsequences in a long time series.

Quick-Motif: What is Motif ?

Page 3: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

3 University of Macau

■ Exact Motif Discovery Input: time series and target motif length Output: most similar subsequence pair in terms of normalized

Euclidean distance.

■ Avoid trivial match Non-overlapping Adjacent subsequence pairs are expected to similar to each

other naturally.

Quick-Motif: Formal Definition

Timeline𝑖+ℓ−1𝑖0 𝑚−1

time series subsequence time series

Page 4: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

4 University of Macau

Quick-Motif: Naïve Solution

Subsequences of length

Time complexity is O().

Motif most similar subsequence pair

Subsequences of length

normalize

Sliding window size = , Step size =

… …

Test all subsequence pairs

Page 5: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

5 University of Macau

Quick-Motif: Existing Solutions ■ Reference-based Index (MK) [Mueen & Keogh, SDM 2009]

Good: Prune unpromising pairs by batches. Bad: time distance computations.

■ Smart Brute Force (SBF) [Mueen, ICDM 2013] Good: time distance computations. Bad: examine all subsequence pairs.

𝑂 (ℓ) 𝑂 (1)

?

Page 6: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

6 University of Macau

Quick-Motif: Fast Distance Computation

■ Incremental distance computation.

9 subsequence pairs 16 subsequence pairs

�̂�0

�̂�1�̂�2

�̂�3�̂�4

�̂�20

�̂�21�̂�22�̂�23

�̂�24

……

�̂�0 �̂�1 �̂�2 �̂�3 �̂�4�̂�20

�̂�21

�̂�22�̂�23�̂�24

Page 7: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

7 University of Macau

Quick-Motif: Pruning of Subsequence Pairs

■ Group every w consecutive subsequences as a PAA MBR.

PAA feature space 𝑓 1

𝑓 2 = 5

𝑀 15

𝑀 25

𝑀 35

Minimum distance between two PAA MBRs Distance LBs. If distance LB is smaller than Further refinement.

minDist

Page 8: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

8 University of Macau

Quick-Motif: Filter-and-Refinement

■ Naïve Solution. Check the distance LBs for all -MBR pairs. The time complexity is , is the PAA dimensionality.

■ How to Efficiently Find Surviving -MBR Pairs? Enable batch pruning. Discover the true motif as soon as possible to improve the

pruning ability.

Page 9: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

9 University of Macau

Quick-Motif: Filter-and-Refinement

■ Enable Batch Pruning Hierarchical Structure Offer reasonable grouping quality, thus good pruning ability. Can be constructed very efficiently.

PAA feature space 𝑓 1

𝑓 2𝑀 3

𝑤𝑀 8

𝑤

𝑀 6𝑤

𝑀 4𝑤

𝑀 2𝑤

𝑀 0𝑤

𝑀 7𝑤

𝑀 5𝑤

𝑀 1𝑤

Hilbert curve sort list

𝑀 4𝑤𝑀 6

𝑤𝑀 0𝑤𝑀 2

𝑤𝑀 7𝑤𝑀 5

𝑤𝑀 3𝑤𝑀 1

𝑤𝑀 8𝑤

𝑀𝑎 𝑀𝑏 𝑀 𝑐

𝑀 𝑟𝑜𝑜𝑡

Level 1

Level 2

minDist

Page 10: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

10 University of Macau

Quick-Motif: Filter-and-Refinement

■ Discover true motif as soon as possible Locality-based Search Strategy

Good locality

Bad locality

Locality-based search vs Best-first search

Locality-based Best-first

Surviving pairs 0.1256M 0.1249M

Heap size N/A 2.78M

# pushes 11.73 M (queue) 6.75 M (heap)

Resp. time 1.56 s 6.32 s

Hilbert curve sort list

𝑀 4𝑤𝑀 6

𝑤𝑀 0𝑤𝑀 2

𝑤𝑀 7𝑤𝑀 5

𝑤𝑀 3𝑤𝑀 1

𝑤𝑀 8𝑤

𝑀𝑎 𝑀𝑏 𝑀 𝑐

𝑀 𝑟𝑜𝑜𝑡

Level 1

Level 2

Leaf nodes

Page 11: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

11 University of Macau

Quick-Motif: Experimental Evaluation

■ Programming Language: C++ Machine: Ubuntu 12.04, 4GB RAM

■ Datasets RW: Random generate. EEG: Reflect the activity of neurons, length 180204. ECG: The Koski ECG. Length 144002. EPG: Sequence that traces insect behaviour, length 106950 TAO: Sea surface temperatures, length 374071.

Page 12: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

12 University of Macau

(a), Effect of on ECG (b), Effect of on EEG

(c), Effect of on EPG (d), Effect of on TAO

Quick-Motif: Performance Evaluation

Page 13: University of Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information

13 University of Macau

Thanks

Q A

input hidden output