anomaly detection in time series data using a fuzzy c ...izakian/c1.pdf · of time series. to...

Anomaly Detection in Time Series Data using aFuzzy C-Means Clustering

Hesam Izakian Witold PedryczDepartment of Electrical and Computer Engineering Department of Electrical and Computer Engineering

University of Alberta University of AlbertaEdmonton, AB, T6G 2V4, Canada Edmonton, AB, T6G 2V4, Canada

[email protected] [email protected]

Abstract - Detecting incident anomalies within temporal data– time series becomes useful in a variety of applications. In thispaper, anomalies in time series are divided into two categories,namely amplitude anomalies and shape anomalies. A unifiedframework supporting the detection of both types of anomalies isintroduced. A fuzzy clustering is employed to reveal the availablestructure within time series and a reconstruction criterion is usedto assign an anomaly score to each subsequence. In the case ofdetecting anomalies in amplitude, the original representation oftime series is used, while for detecting anomalies in shape anautocorrelation representation of time series to capture shapeinformation is employed. Experimental studies concerning tworeal-world data sets are reported.

I. INTRODUCTION

Time series data are common in many application areasincluding medical and environmental sciences, aerospace andindustrial engineering, and finance and agriculture. Detectingemergent anomalies in time series data provides significantinformation for each application. For example, anomalydetection in electrocardiogram (ECG) signals or in aircrafthealth management saves lives, detecting anomalies in diseaseincident provides useful information about the possibility ofoccurring outbreaks, and in industrial process it may help todiagnose the incident faults. In time series, anomaly can beconsidered as the occurrence of any unexpected changes in asubsequence of data. The term “unexpected change” makessense when we compare the available pattern in a subsequencewith the existing patterns in the entire time series. As theresult, one common approach to anomaly detection in timeseries is the use of a fixed length sliding window andgenerating a set of subsequences of time series. In the nextstep, one may use different techniques to detect andcharacterize anomalies i.e. assigning an anomaly score to eachsubsequence.

Anomalies occurring in time series can be a result of achange in the amplitude of data (e.g. a heavy rainfall in a weekof a year), or it may be a change in the shape (e.g. occurring anarrhythmia within a set of normal heartbeats in ECG signals).Therefore, in this paper we categorize anomalies into twotypes: anomalies in shape and anomalies in amplitude. Figure1(a) coming from [15] shows an anomaly in amplitude ofprecipitation time series belonging to one of climate stations inUS, and Figure 1(b) coming from [7] shows an anomaly in

shape within an ECG signal. The anomalous parts arehighlighted in both figures.

(a)

(b)

Fig. 1 The essence of anomaly detection in time series data. (a) Anomaly inamplitude, and (b) anomaly in shape

In this study, we propose a unified framework to detectboth types of anomalies. For this purpose, after generating aset of subsequences of time series using a sliding window, afuzzy C-Means (FCM) clustering [1, 2] has been employed toreveal the available structure within data. Then, areconstruction criterion [3] is considered to reconstruct theoriginal subsequences from the determined cluster centers(prototypes) and partition matrix. For each subsequence, ananomaly score has been assigned based on the differencebetween the original subsequence and the reconstructed one. Inthe case of anomalies in amplitude, the original representationof time series along with the Euclidean distance function isused in the clustering process, while for shape anomalies, firsta representation of subsequences are considered to capture theshape information and then, the Euclidean distance in the newfeature space has been employed.

The idea of assigning an anomaly score to eachsubsequence based on its quality of reconstruction fromrevealed information granules– clusters is novel andpromising. Moreover, providing a uniform framework todetect different types of anomalies, namely amplitude andshape anomalies is beneficial. This paper is organized into sixsections. In Section II, we briefly review related worksreported in the literature. Section III formulates the problem,while in Section IV we propose a new approach for anomalydetection in time series data using FCM. Section V reports theexperimental studies over two real data sets, and Section VIconcludes the study.

1513978-1-4799-0348-1/13/$31.00 ©2013 IEEE

II. RELATED WORKS

There are numerous methods proposed in the literature foranomaly detection in point data [6]. Most of these methods donot consider the sequence nature of data and are not suitablefor anomaly detection in time series. One of the commonlyused techniques for anomaly detection in time series data isassigning an anomaly score to each time series based on itssimilarity to the other time series. Keogh et al. [10] proposedusing a one nearest neighbour (1-NN) approach to detectmaximal different subsequences within a longer sequence(called discords). A symbolic representation of time series,SAX [11] was employed to speed up the proposed method. In[12], the inverse average of cross correlation of each timeseries to other time series was considered as anomaly score.Moreover, in [13] the distance of each time series to its kthnearest neighbour was proposed as anomaly score.

There are also some more advanced methods proposed inliterature for anomaly detection. Takeuchi and Yamanishi [14]proposed a change point detection algorithm in time seriesdata using a sequential discounting autoregressive learning(SDAR). A probabilistic model of time series was developedincrementally and for each given data, its deviation score fromthe learned model was calculated. Then, a new time series wasconstructed by taking an average of scores using a fixed lengthsliding window and finally an outlier detection algorithm hasbeen performed to find change point positions. Khatkhate et al.[17] proposed a wavelet space partitioning method foranomaly detection in mechanical systems. A Hidden MarkovModel (HMM) was constructed from a symbolicrepresentation of time series and for each epoch, the distancebetween the state probability distribution of that epoch and thestate probability distribution at the nominal condition wasconsidered as anomaly measure. Dasgupta and Forrest [18]proposed an anomaly detection inspired by the negativeselection mechanism of the immune system. Normal data wereconsidered as “self” and anomalies were considered as “noneself” patterns. In [19] a self organized map (SOM) was used tocharacterize the time evolution of an AR process. The regionsof the map that AR process was expected to move, wereidentified and the anomalous changes of AR process has beendetected. Furthermore, authors in [16] proposed an entropybased data analysis for detecting anomalies in complexaerospace systems.

Using clustering techniques is common for anomalydetection in point data. Proposed approaches in this categoryusually fall into three categories [6]. In the first category, dataare clustered using some density-based clustering techniquesand the points that do not belong to any cluster are consideredas anomalies. In the second category, the data points locatedinside small or sparse clusters are considered as anomalies,and finally in the third group, the data points that are far awayfrom their nearest cluster centers are considered as anomalies.

Anomaly detection techniques in time series usingclustering, usually group data based on some appropriatesimilarity measures, and then assign an anomaly score to eachtime series using the revealed cluster centers. In [20] a set of

training sequences was clustered using k-medoids clustering,and for each test sequence its inverse similarity to its closestmedoid was treated as the anomaly score. A survey aboutanomaly detection approaches in discrete sequences can befound in [9].

III. PROBLEM FORMULATION

Let us consider a time series pxxx ,...,, 21x of length p.

We aim at finding a set of subsequences of x with length q,having highest amount of unexpected changes (in shape oramplitude) in terms of anomaly score. For this purpose, asliding window with length q, moves thorough the time seriesand generates a set of subsequences. Consequently, there are nsubsequences coming in the form

nqnnn

q

q

xxx

xxx

xxx

,...,,

,...,,

,...,,

21

222212

112111

x

x

x

(1)

Note that in each movement, the sliding window moves r timesteps. As the result, the number of subsequences, n is

1

rqpn (2)

Considering a low value for r (e.g., 1r ) guarantees that noanomalous subsequences are missed, but processing a highamount of subsequences is time consuming. On the other hand,considering a high value for r (e.g., r = q) generates lowernumber of subsequences and processing time will be lower,but there is a risk of losing some anomalous subsequences. Atrade off between accuracy and processing time can beconsidered. Selecting the value of r being proportional to thelength of subsequences is a reasonable choice, i.e. selecting ahigher value of r for longer subsequences and lower value of rfor shorter subsequences. The length of sliding window, q, isanother important parameter that can be selected based on theapplication purpose. However, one may consider differentvalues for this parameter to find some appropriate results.

As mentioned earlier, the objective of this paper is toassign an anomaly score to each subsequence and select thesubsequences with higher anomaly scores as anomalous partsof time series. To handle this task, a fuzzy clustering basedmethod has been employed.

IV. ANOMALY DETECTION USING A FUZZY C-MEANSCLUSTERING

Fuzzy C-Means clustering proposed by Dunn [1] andBezdek [2] is one of the commonly used clustering techniques.It describes n subsequences (or their representation)

nxxx ,...,, 21 using c ( nc ) prototypes cvvv ,...,, 21 and afuzzy partition matrix U through the minimization of thefollowing objective function:

2

1 1ki

c

i

n

k

mikuJ xv

(3)

1514

where m )1( m is a fuzzification coefficient, and . denotesthe Euclidean distance function. The defined objectivefunction in (3) can be minimized by calculating the clustercenters using (4) and partition matrix using (5) in an iterativefashion.

nk

mik

nk k

mik

iu

u

1

1x

v (4)

c

j

m

kj

ki

iku

1

)1/(21

xvxv

. (5)

Clustering subsequence nxxx ,...,, 21 leads to the generationof a set of prototypes representing the normal structure ofsubsequences. Each normal subsequence in data set is similarto one or more prototypes or it can be similar to a combination(in the form of weighted average) of prototypes. The more thesubsequence is similar to the prototypes, the less anomalous itis. To evaluate how much a subsequence is similar to therevealed prototypes (or their combination) a reconstructioncriterion [3] has been considered in this paper. This criterionalso was exploited for clustering spatio-temporal data in [8].Pedrycz and de Oliveira [3] noted that FCM can be viewed asan encoding scheme of data and the original data points (heresubsequences) can be decoded (reconstructed) using theestimated cluster centers and partition matrix. Assuming that

nxxx ˆ,...,ˆ,ˆ 21 are the reconstructed version ofsubsequences nxxx ,...,, 21 , by minimizing the following sumof distances:

2

1 1

ˆ ki

c

i

n

k

mikuF xv

(6)

one may arrive at [3]:

ci

mik

ci i

mik

ku

u

1

1ˆv

x . (7)

After calculating the reconstructed version of eachsubsequence using (7), the reconstruction error in (8), that is asquared Euclidean distance between a subsequence and itsreconstructed version is considered as the evaluation criterionto estimate how much a subsequence is similar to prototypes.In other words, for each subsequence the calculatedreconstruction error using (8) is considered as its anomalyscore.

2ˆ kkkE xx (8)Figure 2 shows the overall scheme of the proposed

method. As mentioned earlier, our objective in this paper is toprovide a unified framework to detect both types of anomaliesthat are anomalies in amplitude and anomalies in shape.

Fig. 2 Overall scheme of the proposed anomaly detection.

As shown in Figure 2, the starting point of the proposedapproach is generating a set of subsequences,

nxxx ,...,, 21 using a sliding window. When the objective is todetect anomalies in amplitude, the Euclidean distance can beconsidered as a suitable dissimilarity measure and thegenerated subsequences can be employed in clustering processwithout any further preprocessing or representation. On theother hand, when detecting anomalies in shape is of concern,the generated subsequences cannot be employed directly inclustering. The reason is that the generated subsequences arenot synchronized and using the Euclidean distance function isnot efficient as similarity measure. Although there are numberof viable distance functions to measure the dissimilarity ofasynchronous time series with respect to their shapes (e.g.dynamic time warping distance [4]), one has to be aware of thechallenges we may encounter for optimizing the FCMobjective function in dealing with those distance functions. Inthis paper we confine ourselves to the Euclidean distancefunction. To compare subsequences based on their shapeinformation, each subsequence is normalized to have a zeromean and a standard deviation equal to one. Then, eachnormalized subsequence is represented using a set ofautocorrelation coefficients. Considering kx as a subsequencewith length q, its autocorrelation coefficient for lag s can beestimated using (9).

q

t ktk

qst kstkktk

skxx

xxxxy

12

,

1 ,,,

)(

))((. (9)

As a matter of fact, the autocorrelation coefficientestimates how much a signal matches its time-shifted version.By considering different lags 1,...,2,1 qs each subsequenceis represented in a new feature space with q-1 dimensions.This representation of time series captures the shapeinformation and removes existing shifts in asynchronous timeseries and the Euclidean distance function can be usedefficiently to compare the subsequences in the new featurespace. The idea of using autocorrelation representation of timeseries for fuzzy clustering was originally proposed in [5]. Forillustrative purposes, let us consider Figure 3(a). In this figure,A is a sine wave, B is a shifted version of A, and C is a squareshaped wave and is synchronized with A. Considering theEuclidean distance function to measure the dissimilarities

1515

between time series in this figure, we have CABA .Although the time series A and B are quite similar, theirEuclidean distance has a high amount because they are notsynchronized. Figure 3(b) shows the autocorrelationrepresentation of time series shown in Figure 3(a). In thisfigure we have CABA . The reason is that theautocorrelation function removes the available shifts in time,and easily asynchronous time series can be compared witheach other in this new feature space with the use of theEuclidean distance.

(a)

(b)

Fig. 3 (a) Three time series and (b) their autocorrelation representation

V. EXPERIMENTAL STUDIES

To illustrate the performance of the proposed method, tworeal data sets, one for anomaly detection in amplitude, and onefor anomaly detection in shape are investigated.

A. Anomalies in amplitudeThe United States monthly precipitation data set [15] from

1990 to 2009 is considered. The length of time series in thisdata set was 240 and four stations with some visible anomalieswere chosen in our experiments. In FCM algorithm, thenumber of clusters, c as well as the fuzzification coefficient, mwas set to 2. Moreover, since this data set comprises monthlydata, the length of sliding window was set to 12 that isequivalent to one year, and in each movement the slidingwindows moves one time step. Figures 4(a)-(d) show theresults. Each figure is composed of two parts: the time seriesand the anomaly scores estimated for subsequences. Since thesliding window generates overlapping subsequences, for eachset of overlapping subsequences, only the anomaly scorecorresponding to the most anomalous subsequence is shown.Moreover, in each time series the subsequences with higheranomaly scores are highlighted. As shown in these figures, theavailable anomalies in amplitude are detected using theproposed approach. Moreover, to all the other parts of data ananomaly score has been assigned to measure in which degreethey are unusual in amplitude.

(a)

(b)

(c)

(d)

Fig. 4 Monthly precipitation time series along with the estimated anomalyscores for different subsequences. In each figure, the subsequences with

higher anomaly scores are highlighted.

B. Anomalies in shapeThe MIT-BIH arrhythmia data set [7] for shape anomaly

detection is considered. This data set is composed of 48 half-hour annotated ECG signals. Four excerpts from the ECGsignals in this data set comprising some visible anomalies wereselected, and similar to the previous experiment, in FCMalgorithm the number of clusters as well as fuzzificationcoefficient was set to 2. To reduce the processing time, theexcerpts are resampled from 360Hz to 128Hz. The length of

1516

each excerpt in our experiments is 5000 and the length ofsliding window was set to around 1.2 times of average lengthof RR peaks to make sure that longer beats (e.g. PVC) can beincorporated in one subsequence. Moreover, the slidingwindow moves around 5% of the length of subsequences ineach movement. After generating the subsequences,normalization has been employed and each subsequence isrepresented using its autocorrelation coefficients. Theclustering was applied over the new feature space.

(a)

(b)

(c)

(d)Fig. 5 Some excerpts from MIT-BIH arrhythmia data set for detecting

anomalies in shape. In each figure, the subsequences with higher anomalyscores are highlighted.

Figures 5 (a)-(d) show the signals along with the estimatedanomaly scores determined for different subsequences. In eachsignal, the subsequences with higher anomaly scores arehighlighted and their corresponding annotation has beenreported. As it can be seen from these figures, in most cases

the detected anomalies are in type of PVC that is one of themost common arrhythmia heartbeats. In Figure 5(a) a normalbeat has a high anomaly score. However as observed, thisheartbeat is different from other normal beats in shape.

C. Parameter analysisParameters that have a direct impact on the performance

of the proposed method are: length of sliding window, lengthof each movement of sliding window, and number of clustersand fuzzification coefficient in FCM. In this sub-section, weinvestigate the effect of the length of sliding window, q, andpropose a simple approach to find an optimal one. For theother parameters a similar procedure can be realized. Figure 6is an excerpt of file 207 from MIT-BIH arrhythmia data setand contains a visible anomaly in shape.

Fig. 6 An excerpt from file 207 in MIT-BIH arrhythmia data set

Assume that h contains the calculated anomaly scores forall subsequences within a time series. We define a confidenceindex as

hhf a (10)

where h is the average of anomaly scores in h, and ah is theanomaly score corresponding to the anomalous subsequencei.e. the maximum score in h. This term is used to evaluate theperformance of the proposed method. A higher value of fmeans that the proposed method assigned a high anomalyscore to the anomalous subsequence and lower scores to thenon-anomalous subsequences. As the result, each parameterthat can maximize this performance index is more suitable.Note that here we assumed that there is only one anomaloussubsequence in time series. One may define ah as the averageof anomaly scores corresponding to top k anomaloussubsequences.

02468101214

40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340q

f

Fig. 7 Different length of sliding windows vs. performance index

Let us consider the length of each movement, r, equal to5% of the length of sliding window, and the number of clustersand the fuzzification coefficient in FCM equal to 2. Figure 7shows the amount of f for different length of sliding windowsfor the time series shown in Figure 6. As shown in this figure,the performance index, f has its optimal value at q=240, whilefor smaller and larger sliding windows it has lower amounts.The reason is that for small windows the anomalous part oftime series cannot fit into a subsequence, and for large

1517

windows, the anomalous part of time series along with somenon-anomalous parts has to be considered in one subsequence.Figure 8 illustrates this problem. In Figure 8(a) the size ofsliding window was 40. We can see that the proposedapproach even cannot find the anomalous part of time series.Moreover, most of anomaly scores are in a same range. Infigure 8(b), the size of sliding window was set to 240 and wecan see that the anomalous part of time series has a largeanomaly score in comparison with the other parts. Finally, inFigure 8(c), the size of sliding window was 340 and as shownin this figure, some non-anomalous parts of time series havebeen considered as anomaly and the anomaly scorecorresponding to the detected anomalous subsequence is closeto some non-anomalous subsequences.

(a)

(b)

(c)Fig. 8 Detected anomalies in time series for different size of sliding windows.

(a) q=40, (b) q=240, and (c) q=340.

VI. CONCLUSIONS

A unified framework for detecting anomalies in amplitudeand shape of time series is introduced. Using a fixed lengthsliding window a set of subsequence are generated, and theFuzzy C-Means clustering is considered to reveal the availablenormal structures within subsequences. To measure thedissimilarity of each subsequence to different cluster centers areconstruction criterion is used and the calculatedreconstruction error has been considered as anomaly score. Fordetecting anomalies in amplitude, the original representationof time series is considered, while for shape anomalies anautocorrelation representation of time series was used.

ACKNOWLEDGMENT

Support provided by Alberta Innovates – TechnologyFutures and Alberta Advanced Education & Technology,Natural Sciences and Engineering Research Council of Canada(NSERC), and the Canada Research Chair (CRC) Program isgratefully acknowledged.

REFERENCES

[1] J.C. Dunn, “A fuzzy relative of the ISODATA process, and its use indetecting compact well-separated clusters,” J. Cyber. 3 (3), pp. 32–57,1973.

[2] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Functions,Plenum, New York, 1981.

[3] W. Pedrycz and J.V. de Oliveira, “A development of fuzzy encoding anddecoding through fuzzy clustering,” IEEE Trans. Instrum. Meas, vol. 57,no. 4, pp. 829–837, Apr. 2008.

[4] D. Berndt and J. Clifford, “Using dynamic time warping to find patternsin time series,” In Knowledge Discovery in Databases Workshop, 1994,pp. 359–370.

[5] P. D’Urso, E.A. Maharaj, “Autocorrelation-based fuzzy clustering of timeseries,” Fuzzy Sets and Systems 160 (2009) 3565–3589.

[6] V. Chandola, A. Banerjee, V. Kumar, “Anomaly detection: A survey,”ACM Computing Surveys, vol 41, no 3, July 2009, pp. 1-72.

[7] A.L. Goldberger, L.A.N. Amaral, L. Glass, J.M. Hausdorff, P.Ch. Ivanov,R.G. Mark, J.E. Mietus, G.B. Moody, C.K. Peng, H.E. Stanley,“PhysioBank, PhysioToolkit, and PhysioNet: components of a newresearch resource for complex physiologic signals,” Circulation 101 (23)(2000) e215–e220.

[8] H. Izakian, W. Pedrycz, I. Jamal, “Clustering spatio–temporal data: Anaugmented fuzzy C–Means,” IEEE Transactions on Fuzzy Systems, 2013.

[9] V. Chandola, A. Banerjee, V. Kumar, “Anomaly detection for discretesequences: A survey,” IEEE Trans. on Knowledge and data engineering,pp. 832-839, vol. 24, no. 5, May 2012.

[10] E. Keogh, J. Lin, A. W. Fu, H. V. Herle, “Finding unusual medical time-series subsequences: Algorithms and applications,” IEEE Transactionson Information Technology in Biomedicine, vol. 10, no. 3, July 2006, pp.429-439.

[11] J. Lin, E. Keogh, L. Wei, and S. Lonardi, “Experiencing SAX: A NovelSymbolic Representation of Time Series,” Data Mining and KnowledgeDiscovery, vol. 15, no. 2, pp. 107-144, 2007.

[12] P. Protopapas, J.M. Giammarco, L. Faccioli, M.F. Struble, R. Dave, C.Alcock, “Finding outlier light curves in catalogues of periodic variablestars,” Monthly Notices of the Royal Astronomical Soc., vol. 369, no. 2,pp. 677-696, 2006.

[13] S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms formining outliers from large data sets,” In Proceedings of the ACMSIGMOD international conference on Management of data, pp. 427–438, 2000.

[14] J. Takeuchi, K. Yamanishi, “A unifying framework for detecting outliersand change points from time series,” IEEE Trans. on Knowledge andData Engineering, 18(4), 482-489, April 2006.

[15] The United States Historical Climatology Network (USHCN), Availableonline at: http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn.html.

[16] A. Agogino, K. Tumer, “Entropy based anomaly detection applied tospace shuttle main engines,” IEEE Aerospace Conference, 2006.

[17] A. Khatkhate, A. Ray, E. Keller, S. Gupta, S.C. Chin, “Symbolic time-series analysis for anomaly detection in mechanical systems,”IEEE/ASME Trans. On Mechatronics, pp. 439- 447, vol. 11, no. 4, Aug2006.

[18] D. Dasgupta, S. Forrest, “Novelty Detection in Time Series Data UsingIdeas from Immunology,” In 5th International Conference on IntelligentSystems, 1996.

[19] C. Brighenti, M.A. Sanz-Bobi, “Auto-regressive processes explained byself-organized maps: Application to the detection of abnormal behavior inindustrial processes,” IEEE Tran. On Neural Networks, pp. 2078- 2090,Vol. 22, No. 12, Dec 2011.

[20] V. Chandola, V. Mithal, V. Kumar, “Comparative evaluation ofanomaly detection techniques for sequence data,” In 8th IEEEInternational Conference on Data Mining, 2008, pp. 743-748.

1518

anomaly detection in time series data using a fuzzy c ...izakian/c1.pdf · of time series. to...

Documents