ashfaq munshi, ml7 fellow, pepperdata

Classifying Multivariate Time Series Scalably

Ashfaq Munshi, Saeed Bidhendi, Faramarz Munshi

November 10, 2017

• Background and Motivation

• Univariate Time Series (UTS)

• Multivariate Time Series (MTS)

• Conclusion

Overview

© Pepperdata, Inc.2

Background

Pepperdata Telemetry Data Scale

Example production deployment:


570Nodes

20Tasks /Node

300Metrics /

Task

5-Sec Sampling

41 MillionPoints / Minute

300Trillion

PerformanceData Points Collected

Our Big Data About Production Big Data


22Thousand

ProductionNodes

50MillionJobs/Year

Example Time Series


• Highly variable in length

• 10 data points to 10K+ data points

• Missing data

• Extremely noisy

Characteristics of our TS


Problem


Classify this collection of time series

to give operators a better understanding of

resource utilization on their clusters and to

enable a scheduler to better optimize cluster

resources

Univariate Time Series

• Two recent approaches from the literature

• Transform the TS into an image then use a tiled CNN

[Wang & Oats 2015]

• Transform the TS into a bag of patterns

[Schafer & Leser 2017]

• Dataset is the UCR data set

• 82 time series data sets

• Number of series < 10K

• Data points per series < 2K

Approaches and Data Set


• Map the time series into

• Gramian Angular Summation Fields

• Gramian Angular Difference Fields

• Markov Transition Fields

• Feed images into a tiled CNN for classification

Time Series and Images


[Wang & Oats, 2015]

• Normalize the time series into [-1,1]

• Transform to Polar Coordinates

Gramian Angular Fields


[Wang & Oats, 2015]

Example GADF Image


[Wang & Oats, 2015]

• Divide TS into windows

• Fourier Transform TS in window

• Apply low-pass filter

• Quantize the Fourier coefficients

• Map window to words

• Extract features from sentences

• Use Logistic Regression classifier

Time Series and Bag of Patterns


[Schafer & Leser 2017]

• Convert TS into image (GADF)

• Use Google’s pre-trained CNN; trained on inception v3

• Embed into 2,048-dimensional vector space

• Train MLP

• 2 hidden layers (50 nodes each)

• ReLU activation

• Dropout for regularization (.1, .2)

• Softmax final layer

Our “Off the shelf” Approach (PD)


Accuracies for a subset of UCR


0%

20%

40%

60%

80%

100%

BOSS (91.1)

PD (89.8)

GADF+GASF+MTF (86.4)

Accuracy on a subset of UCR


68%

70%

72%

74%

76%

78%

80%

82%

84%

86%

WEASEL 1-NN DTW CV 1-NN DTW BOSS LearningShapelet (LS)

TSBF ST EE (PROP) COTE(ensemble)

PD

Training Time Comparison


PD

Multivariate Time Series

• Two recent approaches from the literature

• Use an ESN (“Echo State Network”) to map MTS into

state clouds [Wang, Wang, Liu 2015]

• Use Dynamic Time Warping with Mahalanobis distance

metric [Mei, Liu, Wang, Gao 2016]

• Dataset is from UCI, a small subset of UCR and others

• Number of series ~ 10K

• Data points per series ~ 200

Approaches and Data Set


• Make TS for each variable the same length by zero

padding

• Convert each TS into a GADF image

• Interpolate any missing data points in the image using

linear interpolation on the image

• Stack the images for the five variables

• Use the same process as before for univariate time

series

Our “Off the Shelf” Approach (PD)


5-Fold Cross Validation Error


0

5

10

15

20

25

30

Robot failure LP1 Robot failure LP2 Robot failure LP3 Robot failure LP4 Robot failure LP5

MDDTW Best

PD 5-fold

10-Fold Cross Validation Error


0

5

10

15

20

25

30

Robot failure LP1 Robot failure LP2 Robot failure LP3 Robot failure LP4 Robot failure LP5

Echo Network Best

PD 10-fold

• Four variables:

• CPU, Virtual Memory, HDFS reads, Network Ops

• Each time series collected over one week

• 10 data points to 10K+ data points

• Missing data

• Extremely noisy

• For periods longer than a week, data is much larger

• Sampling rate is the same for all TS

PD Data


Accuracy per Label on PD Dataset G


0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Accuracy

Number of TS = 3092

Lengths per TS = 5 to 8500

Average Accuracy = 78.14%

Accuracy per Label on PD Dataset R


Number of TS = 6715

Lengths per TS = 5 to 9400

Average Accuracy = 75.95

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Summary


Our “Off the Shelf” approach is as good as the

best approaches for both UTS and MTS. And,

the methodology is the same for both types of

TS.

Thank You

ashfaq munshi, ml7 fellow, pepperdata

Technology