ashfaq munshi, ml7 fellow, pepperdata
TRANSCRIPT
Classifying Multivariate Time Series Scalably
Ashfaq Munshi, Saeed Bidhendi, Faramarz Munshi
November 10, 2017
• Background and Motivation
• Univariate Time Series (UTS)
• Multivariate Time Series (MTS)
• Conclusion
Overview
© Pepperdata, Inc.2
Background
Pepperdata Telemetry Data Scale
Example production deployment:
© Pepperdata, Inc.5
570Nodes
20Tasks /Node
300Metrics /
Task
5-Sec Sampling
41 MillionPoints / Minute
300Trillion
PerformanceData Points Collected
Our Big Data About Production Big Data
© Pepperdata, Inc.6
22Thousand
ProductionNodes
50MillionJobs/Year
Example Time Series
© Pepperdata, Inc.7
• Highly variable in length
• 10 data points to 10K+ data points
• Missing data
• Extremely noisy
Characteristics of our TS
© Pepperdata, Inc.8
Problem
© Pepperdata, Inc.9
Classify this collection of time series
to give operators a better understanding of
resource utilization on their clusters and to
enable a scheduler to better optimize cluster
resources
Univariate Time Series
• Two recent approaches from the literature
• Transform the TS into an image then use a tiled CNN
[Wang & Oats 2015]
• Transform the TS into a bag of patterns
[Schafer & Leser 2017]
• Dataset is the UCR data set
• 82 time series data sets
• Number of series < 10K
• Data points per series < 2K
Approaches and Data Set
© Pepperdata, Inc.11
• Map the time series into
• Gramian Angular Summation Fields
• Gramian Angular Difference Fields
• Markov Transition Fields
• Feed images into a tiled CNN for classification
Time Series and Images
© Pepperdata, Inc.12
[Wang & Oats, 2015]
• Normalize the time series into [-1,1]
• Transform to Polar Coordinates
Gramian Angular Fields
© Pepperdata, Inc.13
[Wang & Oats, 2015]
Example GADF Image
© Pepperdata, Inc.14
[Wang & Oats, 2015]
• Divide TS into windows
• Fourier Transform TS in window
• Apply low-pass filter
• Quantize the Fourier coefficients
• Map window to words
• Extract features from sentences
• Use Logistic Regression classifier
Time Series and Bag of Patterns
© Pepperdata, Inc.15
[Schafer & Leser 2017]
• Convert TS into image (GADF)
• Use Google’s pre-trained CNN; trained on inception v3
• Embed into 2,048-dimensional vector space
• Train MLP
• 2 hidden layers (50 nodes each)
• ReLU activation
• Dropout for regularization (.1, .2)
• Softmax final layer
Our “Off the shelf” Approach (PD)
© Pepperdata, Inc.16
Accuracies for a subset of UCR
© Pepperdata, Inc.17
0%
20%
40%
60%
80%
100%
BOSS (91.1)
PD (89.8)
GADF+GASF+MTF (86.4)
Accuracy on a subset of UCR
© Pepperdata, Inc.18
68%
70%
72%
74%
76%
78%
80%
82%
84%
86%
WEASEL 1-NN DTW CV 1-NN DTW BOSS LearningShapelet (LS)
TSBF ST EE (PROP) COTE(ensemble)
PD
Training Time Comparison
© Pepperdata, Inc.19
PD
Multivariate Time Series
• Two recent approaches from the literature
• Use an ESN (“Echo State Network”) to map MTS into
state clouds [Wang, Wang, Liu 2015]
• Use Dynamic Time Warping with Mahalanobis distance
metric [Mei, Liu, Wang, Gao 2016]
• Dataset is from UCI, a small subset of UCR and others
• Number of series ~ 10K
• Data points per series ~ 200
Approaches and Data Set
© Pepperdata, Inc.21
• Make TS for each variable the same length by zero
padding
• Convert each TS into a GADF image
• Interpolate any missing data points in the image using
linear interpolation on the image
• Stack the images for the five variables
• Use the same process as before for univariate time
series
Our “Off the Shelf” Approach (PD)
© Pepperdata, Inc.22
5-Fold Cross Validation Error
© Pepperdata, Inc.23
0
5
10
15
20
25
30
Robot failure LP1 Robot failure LP2 Robot failure LP3 Robot failure LP4 Robot failure LP5
MDDTW Best
PD 5-fold
10-Fold Cross Validation Error
© Pepperdata, Inc.24
0
5
10
15
20
25
30
Robot failure LP1 Robot failure LP2 Robot failure LP3 Robot failure LP4 Robot failure LP5
Echo Network Best
PD 10-fold
• Four variables:
• CPU, Virtual Memory, HDFS reads, Network Ops
• Each time series collected over one week
• 10 data points to 10K+ data points
• Missing data
• Extremely noisy
• For periods longer than a week, data is much larger
• Sampling rate is the same for all TS
PD Data
© Pepperdata, Inc.25
Accuracy per Label on PD Dataset G
© Pepperdata, Inc.26
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Accuracy
Number of TS = 3092
Lengths per TS = 5 to 8500
Average Accuracy = 78.14%
Accuracy per Label on PD Dataset R
© Pepperdata, Inc.27
Number of TS = 6715
Lengths per TS = 5 to 9400
Average Accuracy = 75.95
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
Summary
© Pepperdata, Inc.28
Our “Off the Shelf” approach is as good as the
best approaches for both UTS and MTS. And,
the methodology is the same for both types of
TS.
Thank You