anomaly detection and preprocessing - aungz.com · anomaly detection and preprocessing by ibrahim...

65
Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the Masdar Institute of Science and Technology In Partial Fulfillment of the Requirements for the Degree of Master of Science In Computing and Information Science © 2014 Masdar Institute of Science and Technology All rights reserved

Upload: duongtram

Post on 03-May-2019

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

Anomaly Detection and

Preprocessing

By

Ibrahim Khamis

A Thesis Presented to the

Masdar Institute of Science and Technology

In Partial Fulfillment of the Requirements for the

Degree of

Master of Science

In

Computing and Information Science

© 2014 Masdar Institute of Science and Technology

All rights reserved

Page 2: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

ii

Abstract

In sustainable environments, efficient anomaly (outlier) detection is essential to help

monitor and control the system with the decision making process. Anomaly detection

is an inherently difficult problem due to its decisions of what is normal and what is

unusual, and the ability to distinguish between the two. Another serious difficulty is

that the definition of normal can change. Sensor nodes in wireless sensor networks

have limited energy resources and this hinders the dissemination of the gathered data

to a central location. This stimulated our research to make use of the limited

computational capabilities of these sensor nodes to build a normal model of the data

gathered. In our research, our goal is to determine what is normal and what is

abnormal and to distinguish between Normal & abnormal. We developed an

algorithm called “Two-layered Data Capture Anomaly Detection”. Our algorithm

sends anomalies (2%) as well as roughly (2% or 4%) of normal data for further data

processing and classification purposes. For testing purposes we also deployed three

different machine learning and data mining tools. Three separate data sets were also

used to validate the system. The performance of the proposed method is evaluated and

compared with results obtained from the application of state of the art methods on the

same data sets. In these tests our method provided very promising results.

Page 3: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

iii

This research was supported by the Government of Abu Dhabi to help fulfill the vision

of the late President Sheikh Zayed Bin Sultan Al Nahyan for sustainable development

and empowerment of the UAE and humankind.

Page 4: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

iv

Acknowledgments

Praise be to Allaah, I would like to extend my gratitude to my family members for

their patience and support. I would also like to take this opportunity to thanks those

who actively guided and helped me in this research. Foremost, I would like to express

my deep appreciation to my advisor Dr. Zeyar Aung for his continuous support for

my M. Sc. Study and research. His guidance, patience, motivation, and support helped

me to develop a deep understanding of the subject. Beside my advisor, I would like to

thank my thesis supervisor committee members: Dr. Khaled Elbassioni and Dr. Wei

Lee Woon for their valuable time, comments, and advice.

Ibrahim Khamis

Masdar City, April 30, 2014

Page 5: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

v

Contents

_____________________________________________________________________

1 Introduction .......................................................................................................................... 1

1.1. Background and Motivation ...................................................................................... 1

1.2. Objectives and Contributions .................................................................................... 2

1.3. Relevance to Masdar/UAE ......................................................................................... 6

1.4. Publication ................................................................................................................. 7

1.5. Thesis Organization ................................................................................................... 7

2 Literature Review ................................................................................................................. 8

2.1. Wireless Sensor Network (WSN) ............................................................................... 8

2.2. Data Mining for Outlier Detection ............................................................................. 9

2.3. Outlier Detection for WSNs ..................................................................................... 10

2.3.1. Statistical-based Techniques ............................................................................... 16

2.3.2. Nearest Neighbor-based Techniques .................................................................. 16

2.3.3. Clustering-based Techniques............................................................................... 16

2.3.4. Classification-based Techniques .......................................................................... 17

2.3.5. Comparison of WSN Outlier Detection Techniques ............................................ 19

2.3.6. Recent Trends ...................................................................................................... 19

2.4. Shortcomings of Outlier Detection Techniques ...................................................... 20

2.5. Requirements for Outlier Detection in WSNs ......................................................... 21

3 Proposed Method ................................................................................................................ 22

3.1. Data Capture Anomaly Detection (DCAD) ............................................................... 22

3.2. From DCAD to TLDCAD ............................................................................................ 25

3.3. Use case ( scenario) ................................................................................................. 27

4 Experimental Setups and Results ....................................................................................... 29

4.1. Datasets ................................................................................................................... 29

4.1.1. Synthetic Datasets ............................................................................................... 29

4.1.2. Grand Saint Bernard (GSB) Dataset ..................................................................... 30

4.1.3. Wind Tower Dataset ............................................................................................ 31

4.2. Outlier Detection Performance Measurements ...................................................... 33

4.3. Algorithms Explored ................................................................................................ 35

Page 6: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

vi

4.4. Experiment I: Preprocessing Approach ................................................................... 35

4.4.1. Experiment I Results ............................................................................................ 39

4.5. Experiment II: Classification Approach .................................................................... 41

4.5.1. Experiment II Results ........................................................................................... 43

5 Conclusion and Future Work.............................................................................................. 44

5.1. Conclusion ............................................................................................................... 44

5.2. Future Work ............................................................................................................ 45

A Abbreviations .................................................................................................................... 47

B Masdar Wind Tower .......................................................................................................... 48

Bibliography ............................................................................................................................ 51

Page 7: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

vii

List of Tables

_____________________________________________________________________

Table 1: Comparison of Features for Multivariate Outlier Detection Techniques for WSNs,

adopted from ‎[32]. .................................................................................................................. 15

Table 2: Comparing Different Approaches on Outlier Data, adopted from [54]. ................... 18

Table 3: Example of Wind Tower Data. ................................................................................... 31

Table 4: Confusion Matrix ....................................................................................................... 33

Table 5: Preprocessing Experiment Flow of TLDCAD vs. DCAD as a Preprocessor. ................ 35

Table 6: DCAD Preprocessing Process Summary. .................................................................... 36

Table 7: TLDCAD Preprocessing Process Summary. ................................................................ 36

Table 8: Average Results for 5,000 Synthetic Data Points in Experiment I-A.......................... 40

Table 9: Average Results for 50,000 Synthetic Data Points in Experiment I-B. ....................... 40

Table 10: Best Achieved results for TLDCAD with SVM vs. DCAD as a Classifier in Experiment

II. .............................................................................................................................................. 43

Table 11: Masdar Wind Tower Photographs and Images. ...................................................... 48

Page 8: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

viii

List of Figures

_____________________________________________________________________

Figure 1: Data Capture Anomalies Detection (DCAD) Algorithm ‎[24]. ...................................... 3

Figure 2: Our Proposed Two-layered DCAD Algorithm. ............................................................ 4

Figure 3: Thesis Contributions. ................................................................................................. 5

Figure 4: Three outlier sources in WSNs and their corresponding detection techniques,

adopted from [32]. ................................................................................................................... 11

Figure 5: Categorization of WSN Outlier Detection Methods using Data Mining ‎[23]. .......... 11

Figure 6: Generic Categorization of Outlier Detection Methods ‎[28]. .................................... 12

Figure 7: Outlier Detection Technique for WSNs, adopted from ‎[32]. ................................... 14

Figure 8: Advantage of Mahalonobis Distance. ....................................................................... 17

Figure 9: Recent Developments in Outlier Detection in WSN ................................................. 20

Figure 10: DCAD Illustration. ................................................................................................... 22

Figure 11: Effective Radius. ..................................................................................................... 24

Figure 12: TLDCAD. .................................................................................................................. 26

Figure 13: WSN, adopted from [53]. ....................................................................................... 27

Figure 14: GSB Data Scatter Plot. ............................................................................................ 30

Figure 15: Wind tower Data Scatter Plot. ............................................................................... 32

Figure 16: Precision (P) and Recall (R), adopted from ‎[35]. .................................................... 34

Figure 17: Flowchart of DCAD as a Preprocessor vs. TLDCAD. ................................................ 37

Figure 18: One of the Ten Folds: Illustration of TLDCAD vs. DCAD as Preprocessor. .............. 38

Figure 19: Flowchart of DCAD as a Classifier vs. TLDCAD. ....................................................... 41

Figure 20: One of the Ten Folds: Illustration of TLDCAD & SVM vs DCAD as a Classifier. ....... 42

Figure 21: FFIDCAD with effective n ‎[19]. ............................................................................... 46

Figure 22: Wind Tower air flow diagram (Photographed by Ibrahim Khamis) ....................... 48

Figure 23: Wind Tower Image (Photographed by Ibrahim Khamis) ........................................ 48

Figure 24: Wind Tower Bank of 75 high-pressure nozzles while introducing mist to the Wind

tower Ventilation tube from inside (Photographed by Ibrahim Khamis) ............................... 48

Figure 25: Wind Tower Background (Photographed by Ibrahim Khamis) ............................... 49

Figure 26: Wind Tower how it works (Photographed by Ibrahim Khamis) ............................. 49

Figure 27: Wind Tower Thermal Comfort (Photographed by Ibrahim Khamis) ...................... 50

Page 9: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 1: Introduction 1

CHAPTER 1

1 Introduction

1.1. Background and Motivation

A Wireless sensor network (WSN) refers to a group of spatially dispersed and

dedicated sensors for monitoring and recording the physical conditions of the

environment and organizing the collected data at a central location. WSNs measure

environmental conditions like temperature, sound, pollution level, humidity, wind

speed and direction, pressure, etc. WSNs are widely used in areas such as

manufacturing industry [26], military [15], environmental monitoring [3], smart

power grids [6], smart buildings/homes [16], and many other applications that require

distributed location-aware data sensing [2].

The advantage of using WSNs is that they are cheaper and more practical than wired

networks. However WSNs are vulnerable to intrusions and faults [6] and they are

resource constrained devices. In general, WSN data needs to be mined to detect

anomalies as efficiently as possible. Once found, these will then be sent to the base

station or central location for further processing.

Outliers are encountered in many applications. Here are some terms that are

commonly used in the data mining community: uncommon behavior in data, rare

instances, outliers, anomalies, deviations, exceptions, rare instances, and

irregularities [1]. Hawkins provided the following definition of an outlier: "An outlier

Page 10: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 1: Introduction 2

is defined as an observation that deviates too much from other observations that it

arouses suspicions that it was generated by a different mechanism from other

observations” [28]. Anomaly detection is an inherently difficult problem as it is

essentially the problem of deciding what is not normal; frequently there are no pre-

determined examples or models for "abnormal" data and these needs to be determined

from the statistical properties of the data.

Anomaly detection in wireless sensor nodes is even more challenging because they

have limited power and computing resources. It is virtually impossible to disseminate

all the gathered data to a central location to detect the anomalies. On the other hand

anomalies are important data that of interest of us since they may represent faults,

intrusion, malicious attacks or even fire alarms and also more further could be an

automatic signaling for some actions like dispatching a repair crew to fix the faulty

sensors for example. This motivated our research to make use of the limited

computational capabilities of these devices by building a normal model of the data

gathered. In this way, data that deviates from this model can be classified as

"anomalous" and subsequently forwarded to a central location for further processing.

This process is done inside these devices and hence saves the power that would be

needed to transmit all the data.

1.2. Objectives and Contributions

The objective of this thesis is to develop an efficient and robust algorithm for anomaly

detection and preprocessing in energy constrained devices such as WSNs. The aim is

to find a balance amongst the three desirable factors of speed, accuracy, and low

energy consumption for anomaly preprocessing and detection in WSNs. Towards this

end, a novel algorithm is proposed which we call the “Two-layered Data Capture

Page 11: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 1: Introduction 3

Anomaly Detection” (TLDCAD) is proposed. This algorithm is based on an existing

technique, the Ellipsoidal Data Capture Anomaly Detection (DCAD) method [24],

which is illustrated in Figure 1. DCAD is an anomaly detection algorithm which uses

mean and covariance matrix of the data to define an ellipse which captures the overall

distribution of the data.

Figure 1: Data Capture Anomalies Detection (DCAD) Algorithm ‎[24].

The anomalies are then detected by setting a threshold which defines an outer

boundary of the ellipse which encompasses 98% of the data. Any data points which

fall outside of this boundary are considered to be anomalies and are captured and sent

for further processing.

In this way, DCAD helps to improve energy efficiency by selecting only points

which are considered “interesting”. However, this is only part of the problem, since

there should also be a way to select interesting normal points. TLDCAD seeks to

address this problem by setting a second threshold on the data (in our experiments we

Page 12: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 1: Introduction 4

tried 94% and 96% levels) to capture an additional layer of data points which lie

between this new level and the original 98% level. These points are subsequently

labeled as normal and sent to a central computer for further processing.

Figure 2: Our Proposed Two-layered DCAD Algorithm.

Our aim is to reduce the power consumption in resource constrained devices such

as wireless sensors. We reduce the noise level by preprocessing inside the sensor node

then send a reduced sampled data (before any in-between nodes communication

noise) to the central computer or central node with a light version of SVM for further

classification, visualization, and exploration. We accomplish that by adapting a new

approach of data preprocessing by providing sampled data using a two level ellipse.

This produces balanced data sets with around 50% anomalies. This sampled data is

used then to work with some classifiers that require relatively balanced data sets such

as the typical Support vector Machine (SVM). The data is further processed by the

SVM to provide more accurate classification results. With this distributed approach

Page 13: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 1: Introduction 5

we combine the speed of the ellipse method on the WSN’s node with the accuracy of

the SVM on the central computer. We could say that we are using a distributed data

mining approach in order to try to answer whether anomalies can become classifiable.

The contributions of this thesis can be summarized as follows.

Contribution 1: A new model for lightweight anomaly preprocessing and

detection which applies two separate probability thresholds (TLDCAD).

Contribution 2: Two distinct usage mechanisms where TLDCAD can be

used either as a pre-processor, or as a classifier.

Contribution 3: Comparative evaluation of TLDCAD on both synthetic

and real data sets.

Figure 3 shows these three main research contributions in pictorial format.

Figure 3: Thesis Contributions.

Page 14: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 1: Introduction 6

The advantages of the proposed approach can be summarized as follows:

It provides a new approach of data preprocessing by acquiring the most

informative sampled data using two ellipses.

It reduces the power consumption in resource constrained devices like

wireless sensor nodes; Thus, it improves the sustainability and detection

capability of the whole WSN.

It reduces the effect of communication channel’s noise by preprocessing

inside the sensor nodes then sending a reduced set of sampled data to the

server for further exploration. It improves the reliability of data by

providing more efficient WSN’s data samples. In addition, it also improves

security and privacy of the data, because only parts of the data are

communicated.

It produces more balanced data sets, which are better for classification,

with around 50% or 25% outliers instead of just 2% outliers. (Some

classifiers like a typical Support Vector Machine do not work well with

extremely unbalanced data sets.)

1.3. Relevance to Masdar/UAE

The field of energy efficiency evolved with rich areas of science and applications

that need to be redesigned and reframed for such a new field. Masdar Institute is a

global institute which is focused on this and other related challenges. Techniques

which facilitate the use of energy constrained devices for data collection have a very

direct relevance with Masdar's vision which is centered on energy efficiencies and

Page 15: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 1: Introduction 7

green technologies. Moreover, the proposed method is tested on data collected from

sensors attached to Wind Tower air cooling structure in Masdar Institute.

1.4. Publication

Some portions of the research described in this thesis have been published in the

following paper [13].

I. Khamis and Z. Aung, “Outlier preprocessing in wireless sensor

networks: A two-layered ellipse approach,” in Proceedings of the

6th IEEE International Conference on Developments in eSystems

Engineering (DeSE), 2013, pp. 1-6.

1.5. Thesis Organization

The remainder of the thesis is organized as follow. In Chapter 2; it gives an

overview of the current technologies in WSN and anomaly detection. Then it explains

the proposed algorithm in Chapter 3. After that it describes the experimental setup and

the results in Chapter 4, followed by conclusion and future work in Chapter 5.

Page 16: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 8

CHAPTER 2

3

2 Literature Review

2.1. Wireless Sensor Network (WSN)

Wireless Sensor Network (WSN) is a network that consists of number of nodes; each

one is connected to other nodes wirelessly in the network. WSN are feasible solutions

in situations where it is difficult, or costly or even impractical to implement wired

networks [24]. There are many types of sensors settling in every sensor node. These

sensors allow the node to collect many types of data. Since there are many sensors on

the sensor node, then it becomes a subject of multidimensional features data

collections.

Some sensor nodes these days have very good computational capabilities for

example the Waspmote Error! Reference source not found. is one of the sensors

evices for developers. This sensor node is one of the examples that depict how

wireless sensors are becoming more like mini computers. In recent research some

authors of recent papers targeted the computational capability of the modern sensor

nodes to detect anomalies locally in a decentralized mode [19] [24].

One of the important roles of the WSN is to detect important events or faults in

the network nodes. Detecting the important event or anomalies at the node level will

reduce the amount of data to be transmitted over the network since only the detected

event is transmitted instead of transmitting the whole data set. In such situation the

Page 17: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 9

need for some kind of event detection system become very crucial. Here comes the

role of data mining techniques as discussed in the following section.

2.2. Data Mining for Outlier Detection

Narita and Kitagawa defined data mining as systematically extracting useful

information from data [20]. The aim of data mining is to find patterns from data sets.

Data mining can be either supervised or unsupervised. Supervised data mining

involves the use of training labeled data to build a classification model, and then this

classification model is used to classify the new testing data. Unsupervised data mining

do not use labeled data to classify the new data; it normally uses some technique like

clustering to build clusters around the normal data.

Hence the supervised outlier detection algorithms learn a model by the labeled

training data and decide on the test data whether it is normal or abnormal.

Unsupervised outlier detection finds outlier without prior knowledge of the data [21].

For example, when the data are clustered then the clusters represent the normal data

and any data points fall out the clusters boarder are considered to be outliers. In

addition the aim of traditional pattern recognition is to find the majority of data and

deal with outliers as noise. However, noise for one person could be a signal for other

person [12].

Outliers can indicate important events in some situations and can be of more

importance than the normal data. For examples the fire alarm sensed data is more

important than the normal data. Outlier detection is an important field of data mining

techniques [1]. Outliers are named in many ways and here are some terms that are

used in data mining community: Uncommon behavior in data, rare instances, outliers,

anomalies, devotions, exceptions rare instances, and irregularities [1].

Page 18: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 10

There are many studies about outlier detection. For example, Jiang and Yang

made clusters as a unit and find the outliers clusters as a unit [12]. In this case the

whole cluster becomes an outlier. Lee et al. proposed a novel work for trajectory

outlier detection [14]. The abnormal trajectory among other trajectories becomes an

outlier. Moreover, Menold et al. raised the point that; data point can be compared to

the median of the past and present value and the result is outlier if it exceeds certain

threshold [18]. This show the implementation of the temporal data (data related to

time). On the other hand some people are concerned with the privacy issue of outlier

detection. Challagalla et al. inferred that detection of outliers threats some

organizations and raises their concerns' about the privacy of the analyzed data; for that

reason it is important to incorporate some sort of privacy protection in the outlier

detection technique [4].

2.3. Outlier Detection for WSNs

There are many ways to categories outlier detection. In Figure 4, Zhang et al. [32]

divided outlier detection in WSNs to three branches in terms of the outlier sources,

the first is Fault detection in WSNs and it deals with noise and errors, the second

division is event detection in WSNs which deal with events. The last division is

intrusion detection in WSNs and this one handles the malicious attacks.

Page 19: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 11

Figure 4: Three outlier sources in WSNs and their corresponding detection

techniques, adopted from [32].

Qu classified the outlier detection methods as illustrated in Figure 5 [23]. In this

classification they classified the outlier detection methods to five main braches:

distribution based, depth based, clustering, distance based, and density based. The

widely used ones are density-based and distance-based. Li and Kitagawa said that

distance based method is one of the most common and simplest methods that is used

for outlier detection [14].

Figure 5: Categorization of WSN Outlier Detection Methods using Data Mining ‎[23].

Page 20: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 12

Figure 6: Generic Categorization of Outlier Detection Methods ‎[28].

A related yet more generic classification of outlier detection methods in general

(not necessarily for WSNs only) is provided by Xi [28] as illustrated in Figure 6. In

this classification the author divided the outlier detection algorithms to three main

categories. The first main category is classic outlier which in turn is divided to four

sub categories; statistical based, distance based, deviation based, and density based

approaches. The second main category is the spatial outlier and this is just a

modification of the classic based approach by taking into account the spatial attributes

of the data. Spatial attributes are the attributes that relate to location. The third outlier

Page 21: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 13

detection main category implicitly stated by Xi is the “recent advances” in outlier

detection. In this category there are two sub categories; high dimension based

approach and SVM based approach.Zhang et al. [32] proposed a similar taxonomy for

outlier detection techniques in WSNs as shown in Figure 7. The main categories are

statistical based which is further subdivided to parametric and non-parametric, nearest

neighbor based, clustering based, classification based, and finally the spectral

decomposition based. The parametric based is divided to Gaussian based and non-

Gaussian based. The non-parametric based is divided to kernel based and histogram

based. The classification based is divided to Support vector machine based and

Bayesian network-based. The Bayesian network based is subdivided again to naïve

Bayesian network based, Bayesian believe network based, and dynamic Bayesian

network based. The spectral decomposition is subdivided to the principle component

analysis only. The comparison of various features of the WSN outlier detection

methods are also given in Table 1.

Janssens et al. [9] also compared some outlier detection method from Machine

Learning (ML) and Knowledge Discovery in Databases (KDD). The ML techniques

used are SVM and Parzen Windows, and the KDD techniques used are heuristic local-

density estimation methods such as LOF and LOCI. Janssens et al. used the one class

classification framework. He selected this framework to be able to use AUC (Area

under the Curve) which is a famous performance measurement tool. They found that

Support Vector Domain Description (SVDD) is one of the best performing

methods [9].

Now, let us discuss each category of outlier detection techniques for WSNs.

Page 22: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 14

Figure 7: Outlier Detection Technique for WSNs, adopted from ‎[32].

Page 23: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 15

Table 1: Comparison of Features for Multivariate Outlier Detection Techniques for WSNs, adopted from ‎[32].

Techniques Sensor data Outlier type

Correlation Local Global

Attribute Spatial Temporal Individual Collaboration Individual Aggregate Centralized

Subramaniam

et al. [47]

● ● ●

Rajasegarar et al. [48]

● ●

Rajasegarar et al. [49]

Janakiram et al. [50]]

● ● ● ●

Hill et al. [51] ● ● ●

Chatzigiannakis et al. [52]

● ● ●

Page 24: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 16

2.3.1. Statistical-based Techniques

Statistical based are the earliest method used to detect outliers and they are model

based. The two categories in this field is the parametric based approach and non-

parametric based approach. The parametric based approach assumes that the data has

a known distribution. In this method if the input data does not follow the assumed

distribution then it may cause some problems. The parametric approach has the

following sub categories Gaussian based and non-Gaussian based. Non parametric do

not assume any data distribution for the data. The non-parametric based is divided to

kernel based and histogram based [32] the advantage of non-parametric is that they do

not require and assumption about the distribution of the data.

2.3.2. Nearest Neighbor-based Techniques

This approach makes use of the nearest neighbor values to find outliers and this

approach is one of the most commonly used methods [32] however this technique

does not scale well when the number of the input data variables increase.

2.3.3. Clustering-based Techniques

Tao and Pi observed that in many applications outlier and clustering results are

needed at the same time [27]. In outlier detection techniques, the data are clustered

and hence the data that are outside the cluster are considered to be anomalies. One of

the latest novel examples in anomaly clustering in WSNs is the Data Capture

Anomaly Detection DCAD algorithm [24]. This algorithm use the hyper elliptical

boundary (cluster) to draw a normal model around the data and the data points that

fall outside this ellipsoidal boundary are considered to be anomalies [5].

Page 25: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 17

Figure 8: Advantage of Mahalonobis Distance.

The DCAD [24] and IDCAD [19] exploited the advantage of using Mahalonobis

distance for clustering the data. If the Euclidian distance is used instead of the

Mahalonobis distance then the distance from p2 to its nearest neighbor is greater than

the distance from p1 to its nearest neighbor however when using the Mahalonobis

distance as in Figure 8 then the two distances are the same. (The Mahalonobis

distance is a descriptive statistic that provides a relative measure of a data point's

distance or residual from a common point Error! Reference source not found..)

ence this feature is incorporated in DCAD and iterative DCAD (IDCAD) to find an

ellipsoidal model that best fit the data in order to detect the outliers. The IDCAD use

the same concept as DCAD however it detects outliers online in contrast to DCAD

which uses the batch mode.

2.3.4. Classification-based Techniques

Classification approach is well known in data mining where the classification

algorithm takes labeled input as training data and draws a model from this training

Page 26: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 18

data then it accepts a new data (testing data) and labels them according to the built

model. In this section two main types of classifiers are discussed. The support vector

machine (SVM) and the Naïve Bayes. The Naïve Bayes classifier is subdivided

in [32] to three more sub categories; Naïve Bayesian network-based, Bayesian belief

network-based, and dynamic Bayesian network-based. The SVM classifier was

explored in the field of WSNs in [32] and [25], and it shows very promising results.

Bahrepour et al.[54] explored many techniques in his paper and found that the

Quarter-Sphere SVM is one of the out performers in terms of computational cost and

detection accuracy. The results in the paper are reproduced in the Table 2.

Table 2: Comparing Different Approaches on Outlier Data, adopted from [54].

Technique Accuracy

On artificial

Data

Accuracy

On Real Data

Standard SVM 98.12% 97.64% Quarter-Sphere SVM 98.53% 98.05% FFNN 96.95% 96.04% The Fusion-based Approach (Naïve Bayes) 84.90% 91.00% The Fusion-based Approach (FFNN) 85.95% 98.21% Naïve Bayes 94.84% 75.07 %

The Quarter-Sphere SVM outperforms other methods in terms of the accuracy.

However this is in a centralized mode but it is has the disadvantage of computational

cost in the distributed anomaly detection method where each node has to do its

anomaly detection locally onboard. In [24] the authors stated that the SVM has an

issue in terms of its computational complexity. This is due to the kernel matrix

computation and the linear optimization calculations.

On the other hand the Dynamic Bayesian network model has the advantage of

being able to operate on several data streams at once [32]. However, the Bayesian

Page 27: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 19

networks algorithms in general are facing challenges whenever the numbers of the

input data variables become large in WSNs [32].

2.3.5. Comparison of WSN Outlier Detection Techniques

Tables.1 and Tables.2 shown above respectively compare the features and the

performances of various WSN outlier detection techniques. The important ones that

we can see from the literatures and papers [32] are the SVM, Dynamic Bayesian

Networks and clustering. However since the SVM is computationally complex for

distributed WSN’s systems and the Dynamic Bayesian Networks do not scale well

with the number of the variables the obvious choice from table1 is the clustering. The

clustering technique shown in this table is proposed by the Rajasegarar et al.[19]. This

technique has the following advantage points from Table1. It works will with the

multivariate variables; it takes care of dealing with temporal correlations through a

time window. It also experimented to compute the centralized and distributed

anomaly detection approach with promising results and less communication overhead.

2.3.6. Recent Trends

It is observed from one of the recent works [19] that the best direction in WSNs

domain is to use unsupervised learning and mainly the clustering which already

investigated in some recent works like the one class SVM and the IDCAD. However

the SVM is still to be refined more because of its computational complexity.

Nevertheless the best choice till now is the IDCAD because it is unsupervised, simple,

has low computational complexity and it is implemented in the distributed

environment and had shown good results. Moreover IDCAD is implemented in an

Page 28: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 20

online environment which is a practical approach to deal with the streaming nature of

the WSNs data for an overview of latest works see Figure 9.

Figure 9: Recent Developments in Outlier Detection in WSN

2.4. Shortcomings of Outlier Detection Techniques

Zhang et al. listed the shortcomings of the existing outlier detection techniques as

follows [32].

Most of the techniques ignore the multivariate nature of the WSN and assume

univariate variables where anomaly can be formed by a combination of more

than one variable.

Many techniques do not consider the correlations between the variables.

Questions need to be answered what is the appropriate sliding window size for

temporal data and what is the appropriate choice of the neighboring nodes?

The work on distinguishing between the types of outlier is not sufficient, and

still many techniques do not distinguish between the errors and outliers and

that may lead to the loss of some important events (outliers).

Page 29: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 2: Literature Review 21

The use of user defined threshold in order to determine outliers are vulnerable

to the dynamic nature of the WSN data.

Many techniques do not consider the mobility of some WSNs and assume the

static condition of WSNs.

2.5. Requirements for Outlier Detection in WSNs

Zhang et al. also enumerated the requirements outlier detection techniques as

follows [32].

Since there are many shortcomings in the outlier detection techniques in the

field of WSNs, these shortcomings motivate the development of dedicated

outlier detection techniques for WSNs. The following are some important

WSNs outlier detection requirements.

Detection needs to be distributed to reduce communication overhead.

Detection needs to be online to handle the streaming nature of WSNs data.

It is better to use unsupervised methods since the labeled data in WSNs is not

easy to get.

The detection rate must be high and the false alarm rate should be as low as

possible.

The technique must be not complicated or computationally complex to suite

the nature of the restricted resources on WSNs.

The relation between the data must be considered. Also the time and neighbors

locations are important to be taken into account.

The technique must discriminate between the errors and the measurements in

an effective manner.

Page 30: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 3: Proposed Method 22

CHAPTER 3

3 Proposed Method

3.1. Data Capture Anomaly Detection (DCAD)

Figure 10: DCAD Illustration.

Firstly, a review of the Data Capture Anomaly Detection (DCAD)[19] is presented as

this is the basis for the proposed TLDCAD algorithm. The DCAD is used mainly for

outlier detection in WSNs. It works by first constructing an ellipse which captures a

given percentage (normally 98%) of the data. Hence, data points which fall outside

this ellipse are classified as outliers while points falling inside the ellipse are

classified as normal. However DCAD sends parameters only (mean and covariance

Page 31: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 3: Proposed Method 23

matrix only) and if it is used for data preprocessing (sampling) it will send 98% of

normal and 2% of anomalies. That motivate us to add another layer to be able to

sample the data (preprocess) to use them in a classification process with the Support

vector machine. TLDCAD in addition of providing outlier detection it also sends an

additional effective sample of normal data for the classification purposes.

DCAD and TLDCAD are based on the assumption that the data is normally

distributed; then we need to refresh our minds by starting with the two important

parameters in the bivariate Gaussian distribution, the mean and the covariance matrix

which are given in the following equation (1) and equation (2).

Let X = { } are data samples at time points { }, where each

sample (1 ) is d-dimensional vector in . That is, the vector is a data

instance related to time point j and is composed of d attributes (features).

∑( )( )

Where and are the sample mean and sample covariance of

respectively.

The hyper ellipsoid of effective radius t centered at with covariance matrix is

defined as:

( ) |

}

Where is the characteristic matrix of , and ’ ‘is the effective radius of .

The following quantity (4) represents the Mahalonobis distance (Mahalonobis

distance could be seen as Euclidean distance divided by the covariance matrix) from

to and is the characteristic matrix of .

Page 32: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 3: Proposed Method 24

The boundary surface of the ellipsoid is given by equation (5).

(

)

}

Figure 11: Effective Radius.

Definition 1: Any point that is outside is considered anomaly. And that is known

by computing the Mahalonobis distance of the point from the center of the data:

x is anomalous for ⇔

Using ( )

with results in an ellipsoidal boundary that covers at

least 98% of the data under the assumption that the data was drawn from a normal

distribution. [19]. At this point: is the effective radius of the ellipse. ( )

is the

inverse of the chi squared statistic with d-degrees of freedom and probability .

Page 33: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 3: Proposed Method 25

In other words, 98% of data points that are normal with respect to is defined as a

set of data points that lie between the corresponding values of by setting

and .

3.2. From DCAD to TLDCAD

Since we are looking mainly for an effective preprocessing tool in addition to outlier

detection provided by DCAD, then we motivated to extend the above DCAD method

into TLDCAD (Two-layered DCAD) by generating a new additional inner ellipse in

such a way the band between the original ellipse generated in DCAD and the new

inner ellipse covers either 2% or 4% of the outermost normal data points.

For the (2%) normal data points with respect to , we take a set of data points that lie

between the corresponding values of by setting and .

For the (4%) normal data points with respect to , we take a set of data points that

lie between the corresponding values of by setting and .

Note that the purpose of the DCAD algorithm in our experiments I is to be used as a

data preprocessing tool to partition the data and then send the all the data for further

exploration, visualization, and classification purposes. In contrary, the TLDCAD

algorithm is used to partition the data and then send only the outlier data plus a small

subset of the normal data (either 2% or 4%) for further processing with the Support

Vector Machine for example. The TLDCAD has the advantage of providing more

balanced data (2% Normal vs. 2% Anomalies) or (4% Normal vs. 2% Anomalies) in

comparison to the DCAD that provides (98% Normal vs. 2% Anomalies).

Page 34: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 3: Proposed Method 26

Figure 12: TLDCAD.

Note that For the TLDCAD the rationale between choosing the outermost 4% as

representatives of the "normal" data because these data points are very close in

Mahalonobis distance to the decision boundary and the rest are likely to be redundant

since they are far away in Mahalonobis distance from the decision boundary and can

hence be removed without much consequence.

Page 35: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 3: Proposed Method 27

3.3. Use case ( scenario)

Figure 13: WSN, adopted from [53].

Fig.13 shows the mode of operation of the standard DCAD algorithm. Data that

has been classified as anomalous is transmitted via the WSN to the central computing

facility via a gateway.

Note that the original aim of DCAD was anomaly detection, however, there could

be other potential applications. For example, it would be useful to be able to return an

efficient but representative subset of the data, which would be useful for training

machine learning and other decision support algorithms. To achieve this would

require an extension to the basic DCAD algorithm. What is required is a method for

selecting a critical subset of the normal data.

TLDCAD provides a simple but effective way of achieving this. This will help the

WSN to greatly reduce energy consumption with communication 4% or 6% of

effective data points instead of communicating the 100% of the data points in our

preprocessing experiments. The sampled data will reduce the running time of the

algorithm used at the central node of the WSN or at the external processing computer

for the huge data collected from a huge data stream collected from a huge number of

sensor nodes in WSN.

Page 36: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 3: Proposed Method 28

Moreover, the DCAD originally just send parameters and outliers which are of no

much use for the machine learning and further classification purposes.

Page 37: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 29

CHAPTER 4

4 Experimental Setups and Results

4.1. Datasets

4.1.1. Synthetic Datasets

Synthetic data was generated by sampling from a bivariate Gaussian distribution

as was done in [19]. The parameters of the distribution are simply the mean and

covariance matrix:

(

)

The synthetic data are two dimensional only for visualization purposes.

We generated 7 datasets:

1. 500

2. 1,000

3. 2,000

4. 3,000

5. 4,000

6. 5,000

7. 50,000

Note that we found out by experiment our method works better with smaller data

sets so we focused on small data from 500 to 5,000. In addition, we did not extend our

experiment up to 50,000 or more due to the research time limit.

Page 38: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 30

4.1.2. Grand Saint Bernard (GSB) Dataset

The GSB dataset was gathered in the year 2007 from 23 sensors that deployed at

the Grand-St-Bernard pass between Italy and Switzerland [44]. We extracted the data

Gathered during October by station 10. Also for visualization purposes we only

extracted two features:

Column 9: Ambient Temperature.

Column 12: Relative Humidity.

The size of the extracted GSB data is 17, 302 data points of 2 dimensions.

Figure 14: GSB Data Scatter Plot.

Page 39: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 31

Note that for GSB data set it seems to be there is a sudden fault in the humidity

sensor of the node (which is depicted by the U-shaped graphed at the bottom of the

GSB scatter plot in Fig. (14)[19].

4.1.3. Wind Tower Dataset

One of the obvious and important places to gather data is the Masdar Wind tower

project. The Wind Data are collected from the Innovative Masdar Wind Tower project

at the bottom of the wind tower cooling opening. The Wind Tower is a modern

implementation of the traditional Arabic wind tower that has been used to provide

cooling for the traditional Arabic houses [45], [46].

The data are collected from 10:15:15, 02-10-2013 up until 11:01:40, 08-10-2013.

The data size = 8,082 of two dimensional data points.

The two collected attributes are:-

Column 1 = Relative Humidity.

Column 2 = Temperature.

Let us highlight that; the data gathered in reality are two dimensions, where each

dimension represents one attribute of the sensed data, and we focused on 2

dimensions to be able to visualize the data in our experiments. However, we could go

for more dimensions in the future works. Table 3 provides an example of 2-

dimensional vectors containing the two attributes: temperature and humidity.

Table 3: Example of Wind Tower Data.

Time Measurements

Humidity Temperature

43 32.4

75 31.4

99.9 28

Page 40: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 32

25 37.4

Note that for table. 3 the data shown are not consecutives data points, they are just

for showing various wind tower data values at different times.

Figure 15: Wind tower Data Scatter Plot.

Note that for the Wind tower data set it seems to be there is 100% saturation of the

relative humidity sensor of the Arduino node (see the vertical line structure at the

right of the Wind Data scatter plot in Fig. 15.

Note also that we were only concerned with 2-diminonal data because the scale of

the WSN nodes in the future could be huge and lots of data will be gathered and that

Page 41: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 33

is enough not to make the project complicated for now; since some sensors can get

faulty or malfunction due to the harsh climate in the summer and the dusty wither and

hence decided to not work for more than 2 dominions. On the other hand we did not

need for now more than relative humidity and temperature analysis for our project

also it easier to use 2 dominions for continues data collection and future online data

monitoring and control.

In addition, At present we did not run experiments with more than two attributes

to visualize the data, and for the possibility of huge data collection of tens or hundreds

of sensor deployments, on the other hand the required data collection from the wind

tower for analyzes is only for temperature and relative humidity for the current

situation, so that going for more than 2 dominions for now is not essentially relevant.

4.2. Outlier Detection Performance Measurements

Detection rate and false alarm rate also known as false positive rate (FPR) and

receiver operating characteristic curves are usually used to show the tradeoff between

the detection rate and the false alarm rate in WSN [32], nevertheless; Intrusion

detection is an important aspect of outlier detection and the metrics used commonly in

this field are ROC analysis, precision, recall, F-manures and confusion matrix [5].

Table 4: Confusion Matrix

Confusion matrix

Predicted labels

Normal Anomalies

Actual Labels

Normal

True Negative

(TN)

False Positive

(FP)

Anomalies

False Negative

(FN)

True Positives

(TP)

Page 42: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 34

(Correctly classified)

From Table 4 the following equations can be defined to calculate the precision,

recall, and F-value.

Figure 16: Precision (P) and Recall (R), adopted from ‎[35].

In Figure 16 the relevant items are to the left of the straight line while the

retrieved items are within the oval. The red regions represent errors. On the left these

are the relevant items not retrieved (false negatives), while on the right they are the

retrieved items that are not relevant (false positives) [35].

We decided to select the Precision and Recall and F1 as our metrics because they

are well known from the data mining prospective, also the F1 [5] could be seen as a

balanced measure that contain booth precision and recall.

Page 43: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 35

4.3. Algorithms Explored

The conducted testing on sampled data from the ellipses (using both DCAD and

TLDCAD) are evaluated on two main types of classifiers: Support Vector Machine

(SVM) and Artificial Neural Network (ANN).

4.4. Experiment I: Preprocessing Approach

The main methods used in the preprocessing approach are adding an additional

layer to DCAD to get the TLDCAD and then label the output data with two labels

(classes): “anomalous” and “normal”. Then, we send the data to a classifier Support

Vector Machine (SVM) or Artificial Neural Networks (ANN) and then compare the

output data from the DCAD and TLDCAD to draw the final conclusion. The

flowchart in Fig. 17 summarize the methodology used for the preprocessing approach.

Table 5 shows of how the data are generated and how they are preprocessed using

the ellipses for the SVM classifier. (That is virtually the same for the ANN classifier.)

Table 5: Preprocessing Experiment Flow of TLDCAD vs. DCAD as a Preprocessor.

Step 1 Synthetic data generation: scatter plot of 5,000 normally distributed synthetic data

Step 2

TLDCAD’s‎output:

2% normal data

(between the two ellipses) and 2%

anomalous data (outside the outer

ellipse)

TLDCAD’s‎output:

4% normal data

(between the two ellipses) and 2%

anomalous data (outside the outer

ellipse)

DCAD’s‎output:

98% normal data

(within the ellipse)

and 2% anomalous data (outside

the ellipse)

Step 3 Scatterd plot for Scatterd plot for Scatterd plot for

Page 44: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 36

2% normal data vs. 2%

anomalous data

4% normal data vs. 2% anomalous

data

98% normal data vs. 2%

anomalous data

Step 4

SVM output for

2% normal data vs. 2%

anomalous data

SVM output for

4% normal data vs. 2% anomalous

data

SVM output for 98% normal data

vs. 2% anomalous data

Table 6: DCAD Preprocessing Process Summary.

DCAD (98% Normal vs. 2% Outliers) processes summary

Table 7: TLDCAD Preprocessing Process Summary.

TLDCAD (4% Normal vs. 2% Outliers) processes summary

Page 45: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 37

Figure 17: Flowchart of DCAD as a Preprocessor vs. TLDCAD.

Page 46: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 38

Figure 18: One of the Ten Folds: Illustration of TLDCAD vs. DCAD as Preprocessor.

Page 47: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 39

4.4.1. Experiment I Results

Note for both approaches, we are focusing on the “2% vs. 4%” TLDCAD since it

has better results than “2% vs. 2%”.

Tables 8 and 9 show some promising and even competitive results from the

preprocessing approach. Note that how the F1 measure value increases with the

increase of the normal layered data.

For example the average value for the SVM cross validation was 0.963 for 2%

TLDCAD and increased to 0.989 which is very competitive to the DCAD 98%

normal output which has the value for F1 measure of 0.998. That is by using

TLDCAD with 4% normal data we can obtain very competitive accuracy measures

and using only 6% of the data instead of using 100% of the data in order to have just

1% of accuracy increase.

The main advantage of TLDCAD over DCAD in the context of pre-processing is

in terms of TLDCAD's runtime, which is much shorter than that of DCAD. This is

especially important in WSNs as it will help to greatly reduce power consumption.

The much reduced running times of TLDCAD over DCAD on a standard PC for

various experimental setups can be observed in Tables 8 and 9. On the other hand we

can observe from the 5,000 data points results that when we increase the normal data

sample from 2% to 4% we can achieve approximately the same accuracy of that

obtained from 98% normal data point’s classification!. Not even for the 5000 but also

for the 50,000 the performance of using 4% normal vs. 98% is almost the same, while

the 4% TLDCAD is around 7 times faster than the DCAD Preprocessor.

Page 48: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 40

Table 8: Average Results for 5,000 Synthetic Data Points in Experiment I-A.

TLDCAD: 2% normal

vs. 2% anomalous

TLDCAD: 4% normal

vs. 2% anomalous

DCAD: 98% normal

vs. 2% anomalous

SVM with SMO

Precision 0.958763 0.994845 0.998775

Recall 0.968750 0.984694 0.999591

F1 0.963731 0.989744 0.999183

Time (sec)

3.645893 3.780674 13.269996

ANN using 10 hidden layers

Precision 0.900000 0.969697 1.000000

Recall 0.750000 0.941176 0.989276

F1 0.818182 0.955224 0.994609

Time (sec)

2.000225 2.985272 5.224756

Table 9: Average Results for 50,000 Synthetic Data Points in Experiment I-B.

TLDCAD: 2% normal

vs. 2% anomalous

TLDCAD: 4% normal

vs. 2% anomalous

DCAD: 98% normal

vs. 2% anomalous

SVM with SMO

Precision 0.905149 0.999484 1.000000

Recall 1.000000 0.961787 0.992368

F1 0.950213 0.980273 0.996169

Time (sec)

9.808853 12.681529 85.694093

ANN using 10 hidden layers

Precision 0.993243 0.996656 0.999728

Recall 0.967105 0.980263 0.999728

F1 0.980000 0.988391 0.999728

Time sec 7.051017 11.293773 248.290314

Page 49: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 41

4.5. Experiment II: Classification Approach

Figure 19: Flowchart of DCAD as a Classifier vs. TLDCAD.

The methods used in this Classification approach are explained in Fig. 19 where

we have to feed a labeled data to our classification process (TLDCAD and SVM) and

feed the same data to the DCAD used as classifier, and then we compare the output to

draw the final conclusion. In Figure 20 the 10-fold cross validation is depicted for the

classification approach.

Page 50: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 42

Figure 20: One of the Ten Folds: Illustration of TLDCAD & SVM vs DCAD as a Classifier.

Page 51: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 4: Experimental Setups and Results 43

4.5.1. Experiment II Results

Table 10 show some promising results and the ability to our approach to

outperform the DCAD for all the synthetic datasets except for the 50,000 data set, on

the other hand our approach was equally in f1 measures with the GSB real dataset and

the Masdar Institute’s Wind Tower dataset.

Note that in Experiment II for the classification approach we have to randomize

the data many rounds to get the data structure that matches best for Experiment II.

However the average of these random rounds is not reported since the aim for us now

is to show the ability for the TLDCAD to outperform the DCAD in terms of F1

measure.

From Table.10 we can also see how the (SVM & TLDCAD) outperforms the

DCAD in the classifying and detecting anomalies by 7% and that is depicted by the

(SVM & TLDCAD) when they hit the 1.00 vs. DCAD of just 0.93 on the 500

synthetic dataset.

Table 10: Best Achieved results for TLDCAD with SVM vs. DCAD as a Classifier in

Experiment II.

F1 Measures

# Datasets SVM & TLDCAD DCAD

2% vs. 2% 2% vs. 4%

2% vs. 98% (Parameters Only)

mean and covariance matrix.

1 500 synthetic 0.81 1.00 0.93

2 1,000 ~ 0.80 0.94 0.90

3 2000 ~ 0.90 0.92 0.89

4 3000 ~ 0.95 0.97 0.94

5 4000 ~ 0.95 0.98 0.97

6 5000 ~ 0.98 0.99 0.98

7 50,000~ 0.96 0.98 0.99

8 GSB downloaded 0.43 1.00 1.00

9 Wind Tower

gathered Data

0.10 1.00 1.00

Page 52: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 5: Conclusion and Future Work 44

CHAPTER 5

6

5 Conclusion and Future Work

5.1. Conclusion

In this research work, we have two different approaches; the first approach is the

DCAD algorithm is used in this thesis to partition the data and then sends all the data

for classification purposes. Building on this work we proposed a TLDCAD algorithm

to send reduced amount of data than the data obtained by the DCAD. The output of

the two algorithms is compared; the results obtained show promising results for

TLDCAD. The current work is conducted using synthetic datasets. In addition we

moved further for a second approach, that is using DCAD as a classifier and compare

it to our algorithm TLDCAD joined with SVM. The results obtained for this approach

where also promising and open a wide door for outlier detection and preprocessing in

energy constrained devices..

Summary points are provided below with the illustration of our TLDCAD

algorithm:

It is based on the DCAD Algorithm.

It is faster than DCAD as data preprocessor (sampling method).

It is able to provide more accurate classification results on small and medium

data sets ( 5000 data points).

It is good for security and privacy proposes because a subset but not all the

data are communicated.

Page 53: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 5: Conclusion and Future Work 45

5.2. Future Work

The technique would scale with the number of attributes in terms of running time

since the matrix multiplication is involved. As a future work for TLDCAD code, we

are considering to generalize the algorithm to work with more than 2-dimensional

data. The matrix contribution for now is just a constant. However by going to higher

diminutions, our technique should scale with the number of attributes in terms of

running time. That is because the matrix multiplication will become the main

dominant of the TLDCAD algorithm’s running time since it has three nested loops

and the code does not have any recursion involved. This will give us an , and at

that time more efficient running time algorithms should be implemented to reduce the

running time. However if we count the data samples then we are having a loop over

the number of samples k which will give us a matrix multiplication k times. The final

running time could then be expressed as .

On the other hand, we can work with multiple clusters. Or try to exploit the

following formulas of the iterative DCAD to make our algorithm work online instead

of the current batch mode.

The below images are showing how online ellipses detecting anomalies with the

time dimension. If we proceed in the time dimension then we need to exploit the

following formulas (12) and (13) to add our own modification to make our algorithm

work online perfectly.

Page 54: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

CHAPTER 5: Conclusion and Future Work 46

Figure 21: FFIDCAD with effective n ‎[19].

[

( – )( – )

( – )

( – )

]

The equations shown above are for the Forgetting factor Iterative DCAD (FFIDCAD)

with effective n. thee FFIDCAD add the forgetting factor λ to the older samples which

gives more weight to the new sample over the old ones in order to forget the old

samples. In order to limit the growth of k in FFIDCAD (with effective n) it has to use

a constant in Equation 8. This constant is used instead of k when k ≥ . [19].

Page 55: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

Abbreviations

47

APPENDIX A

A Abbreviations

ANN: Artificial Neural Network

DCAD: Data Capture Anomaly Detection

FIDCAD: Forgetting Factor Iterative Data Capture Anomaly Detection

IDCAD: Iterative Data Capture Anomaly Detection

ML: Machine Learning

SVM: Support Vector Machine

TLDCAD: Two-layered Data Capture Anomaly Detection

WSN: Wireless Sensor Network

Page 56: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

Masdar Wind Tower

48

APPENDIX B

B Masdar Wind Tower

Table 11: Masdar Wind Tower Photographs and Images.

Figure 22: Wind Tower air flow diagram

(Photographed by Ibrahim Khamis)

Figure 23: Wind Tower Image

(Photographed by Ibrahim Khamis)

Figure 24: Wind Tower Bank of 75

high-pressure nozzles while

introducing mist to the Wind tower

Ventilation tube from inside

(Photographed by Ibrahim Khamis)

Page 57: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

Masdar Wind Tower

49

Figure 25: Wind Tower Background (Photographed by Ibrahim Khamis)

Figure 26: Wind Tower how it works (Photographed by Ibrahim Khamis)

Page 58: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

Masdar Wind Tower

50

Figure 27: Wind Tower Thermal Comfort (Photographed by Ibrahim Khamis)

Page 59: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

Bibliography

51

Bibliography

[1] S. Alam, G. Dobbie, P. Riddle, M. A. Naeem, “A swarm intelligence based

clustering approach for outlier detection,” in Proceedings of the 2010 IEEE

Congress on Evolutionary Computation (CEC), 2010, pp.1-7.

[2] M. A. Azim, Z. Aung, W. Xiao, V. Khadkikar, and A. Jamalipour,

“Localization in wireless sensor networks by constrained simultaneous

perturbation stochastic approximation technique,” in Proceedings of the 6th

IEEE International Conference on Signal Processing and Communication

Systems (ICSPCS), 2012, pp. 1-9.

[3] M. A. Azim, F. M. Kiaie, and M. H. Ahmed, “Environmental forest

monitoring using wireless sensor networks,” in Wireless Sensor Networks:

Current Status and Future Trends, CRC Press, 2012, pp. 61-78.

[4] A. Challagalla, S. S. S. Dhiraj, D. V. L. N. Somayajulu, T. S. Mathew, S.

Tiwari, and S. S. Ahmad, “Privacy preserving outlier detection using

hierarchical clustering methods,” in Proceedings of the 34th IEEE Annual

Computer Software and Applications Conference Workshops (COMPSACW),

2010, pp. 152-157.

[5] P. Dokas, L. Ertoz, V. Kumar, A. Lazarevic, J. Srivastava, and P. N. Tan,

“Data mining for network intrusion detection,” in Proceedings of the 2002

NSF Workshop on Next Generation Data Mining (NGDM), 2002, pp. 21-30.

[6] M. A. Faisal, Z. Aung, J. Williams, and A. Sanchez, “Securing advanced

metering infrastructure using intrusion detection system with data stream

Page 60: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

Bibliography

52

mining,” in Proceedings of the 2012 Pacific Asia Workshop on Intelligence

and Security Informatics (PAISI), 2012, pp. 96-111.

[7] S. Ganapathy, N. Jaisankar, P. Yogesh, and A. Kannan, “An intelligent system

for intrusion detection using outlier detection,” in Proceedings of the 2011

International Conference on Recent Trends in Information Technology

(ICRTIT), 2011, pp. 119-123.

[8] D. M. Hawkins. Identification of Outliers, Chapman and Hall, London, 1980.

[9] J. H. M. Janssens, I. Flesch, and Eric O. Postma, “Outlier detection with one-

class classifiers from ML and KDD,” in Proceedings of the 2009 International

Conference on Machine Learning and Applications (ICMLA), 2009, pp. 147-

153.

[10] Q. Ji-lin, Q. Wen, S. Ying, and F. Yu-mei, “A nonparametric outlier detection

method for financial data,” in Proceedings of the 2009 International

Conference on Management Science and Engineering (ICMSE), pp. 1442-

1447.

[11] S.-Y. Jiang and Q.-b. An, “Clustering-based outlier detection method,” in

Proceedings of the 5th International Conference on Fuzzy Systems and

Knowledge Discovery (FSKD) Volume 2, 2008, pp. 429-433.

[12] S.-Y. Jiang and A.-M. Yang, “Framework of clustering-based outlier

detection,” in Proceedings of the 6th international conference on Fuzzy

Systems and Knowledge Discovery (FSKD) Volume 1, 2009, pp. 475-479.

[13] I. Khamis and Z. Aung, “Outlier preprocessing in wireless sensor networks: A

two-layered ellipse approach,” in Proceedings of the 6th IEEE International

Conference on Developments in eSystems Engineering (DeSE), 2013, pp. 1-6.

Page 61: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

Bibliography

53

[14] J. G. Lee, J. Han, and X. Li, “Trajectory outlier detection: A partition-and-

detect framework,” in Proceedings of the 24th IEEE International Conference

on Data Engineering (ICDE), 2008, pp. 140-149.

[15] S. H. Lee, S. Lee, H. Song, and H.-S. Lee, "Wireless sensor network design

for tactical military applications: Remote large-scale environments,” in

Proceedings of the 2009 IEEE Conference on Military Communications

(MILCOM), 2009, pp. 1-7.

[16] D. Li, Z. Aung, S. Sampalli, J. Williams, and A. Sanchez, “Privacy

preservation scheme for multicast communications in smart buildings of the

smart grid,” Smart Grid and Renewable Energy, vol. 4, no. 4, 2013, pp. 313-

324.

[17] Y. Li and H. Kitagawa, “Db-outlier detection by example in high dimensional

datasets,” in Proceedings of the 2007 IEEE International Workshop on

Databases for Next Generation Researchers (SWOD), 2007, pp. 73-78.

[18] P. H. Menold, R. K. Pearson, and F. Allgower, “Online outlier detection and

removal,” in Proceedings of the 7th Mediterranean Conference on Control

and Automation (MED), 1999, pp. 1110-1133.

[19] M. Moshtaghi, C. Leckie, S. Karunasekera, J. C. Bezdek, S. Rajasegarar, and

M. Palaniswami, “Incremental elliptical boundary estimation for anomaly

detection in wireless sensor networks,” in Proceedings of the 11th IEEE

International Conference on Data Mining (ICDM), 2011, pp. 467-476.

[20] K. Narita and H. Kitagawa, “Outlier detection for transaction databases using

association rules,” in Proceedings of the 9th International Conference on

Web-Age Information Management (WAIM), 2008, pp. 373-380.

Page 62: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

Bibliography

54

[21] J. H. Oh, J. Gao, and K. Rosenblatt, “Biological data outlier detection based

on Kullback-Leibler divergence,” Proceedings of the 2008 IEEE International

Conference on Bioinformatics and Biomedicine (BIBM), 2008, pp. 249-254.

[22] K. Prakobphol and J. Zhan, “A novel outlier detection scheme for network

intrusion detection systems,” in Proceedings of the 2008 International

Conference on Information Security and Assurance (ISA), 2008, pp. 555-560.

[23] J. Qu, “Outlier detection based on Voronoi diagram,” in Proceedings of the

4th International Conference on Advanced Data Mining and Applications

(ADMA), 2008, pp. 516-523.

[24] S. Rajasegarar, J. C. Bezdek, C. Leckie, and M. Palaniswami, “Elliptical

anomalies in wireless sensor networks,” ACM Transactions on Sensor

Networks, vol. 6, no. 1, 2009, pp. 1-28.

[25] S. Rajasegarar, C. Leckie, J. C. Bezdek, and M. Palaniswami, “Centered

hyperspherical and hyperellipsoidal one-class support vector machines for

anomaly detection in sensor networks,” IEEE Transactions on Information

Forensics and Security, vol. 5, no. 3, 2010, pp. 518-533.

[26] I. Silva, L. A. Guedes, P. Portugal, and F. Vasques, “Reliability and

availability evaluation of wireless sensor networks for industrial applications,”

Sensors, vol. 12, no. 1, 2012, pp. 806-838.

[27] Y. Tao and D. Pi, “Unifying density-based clustering and outlier detection,” in

Proceedings of the 2nd International Workshop on Knowledge Discovery and

Data Mining (WKDD), 2009, pp. 644-647.

[28] J. Xi, “Outlier detection algorithms in data mining,” in Proceedings of the 2nd

International Symposium on Intelligent Information Technology Application

(IITA), 2008, pp. 94-97.

Page 63: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

Bibliography

55

[29] Z. Yang, N. Meratnia, and P. Havinga, “An online outlier detection technique

for wireless sensor networks using unsupervised quarter-sphere support vector

machine,” in Proceedings of the International Conference on Intelligent

Sensors, Sensor Networks and Information Processing (ISSNIP), 2008, pp.

151–156.

[30] Y. Zhang, N. Meratnia, and P. Havinga, “Adaptive and online one-class

support vector machine-based outlier detection techniques for wireless sensor

networks,” in Proceedings of the 2009 International Conference on Advanced

Information Networking and Applications Workshops (WAINA), 2009, pp.

990-995.

[31] Y. Zhang, N. Meratnia, and P. J. M. Havinga, “Ensuring high sensor data

quality through use of online outlier detection techniques,” International

Journal of Sensor Networks, vol. 7, no. 3, 2010, pp. 141-151.

[32] Y. Zhang, N. Meratnia, and P. Havinga, “Outlier detection techniques for

wireless sensor networks: A survey,” IEEE Communications Surveys and

Tutorials, vol. 12, no. 2, 2010, pp. 159-170.

[33] http://db.csail.mit.edu/labdata/labdata.html

[34] http://mathpax.com/images/statistics.pdf

[35] http://en.wikipedia.org/wiki/Precision_and_recall

[36] https://portal.masdar.ac.ae/Pages/NewsDetail.aspx?NID=401

[37] http://www.libelium.com/products/meshlium/wireless-sensor-networks

[38] http://www.libelium.com/products/waspmote

[39] http://www.libelium.com/development/developers/

[40] http://www.techopedia.com/definition/25651/wireless-sensor-network-wsn

[41] http://www.seeedstudio.com/depot/grove-rtc-p-758.html?cPath=25_30

Page 64: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

Bibliography

56

[42] http://www.seeedstudio.com/depot/grove-temperaturehumidity-sensor-pro-p-

838.html

[43] http://www.seeedstudio.com/depot/sd-card-shield-p-492.html?cPath=132_134

[44] http://lcav.epfl.ch/cms/lang/en/pid/86035

[45] http://www.masdar.ac.ae/campus-community/the-campus/windtower

[46] http://masdarcity.ae/en/110/frequently-asked-questions/

[47] S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogerakiand, and D.

Gunopulos, Online Outlier Detection in Sensor Data using Nonparametric

Models, J. Very Large Data Bases, VLDB 2006.

[48] S. Rajasegarar, C. Leckie, M. Palaniswami, and J.C. Bezdek, Distributed

Anomaly Detection in Wireless Sensor Networks, Proc. IEEE ICCS, 2006.

[49] S. Rajasegarar, C. Leckie, M. Palaniswami, and J. C. Bezdek, Quarter Sphere

Based Distributed Anomaly Detection in Wireless Sensor Networks, Proc.

IEEE International Conference on Communications, pp. 3864-3869, 2007.

[50] D. Janakiram, A. Mallikarjuna, V. Reddy, and P. Kumar, Outlier Detection in

Wireless Sensor Networks using Bayesian Belief Networks, Proc. IEEE

Comsware, 2006.

[51] D.J. Hill, B.S. Minsker, and E. Amir, Real-Time Bayesian Anomaly Detection

for Environmental Sensor Data, Proc. 32nd Congress of the International

Association of Hydraulic Engineering and Research, 2007.

[52] V. Chatzigiannakis, S. Papavassiliou, M. Grammatikou, and B.Maglariset,

Hierarchical Anomaly Detection in Distributed Large-Scale Sensor Networks,

Proc. ISCC, 2006.

[53] http://en.wikipedia.org/wiki/Wireless_sensor_network

Page 65: Anomaly Detection and Preprocessing - aungz.com · Anomaly Detection and Preprocessing By Ibrahim Khamis A Thesis Presented to the ... Figure 22: Wind Tower air flow diagram (Photographed

Bibliography

57

[54] M. Bahrepour,Y. Zhang, N. Meratnia, and P. Havinga, “Use of Event Detection

Approaches for Outlier Detection in Wireless Sensor Networks,” IEEE

Communications Surveys and Tutorials, 2009, pp. 439-444.