finding anomalies in building management

19
Finding Anomalies in Building Management Data using Clustering Algorithms Jasper van Enk 6150519 Bachelor thesis Credits: 18 EC Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Prof. dr. M. van Someren Informatics Institute Faculty of Science University of Amsterdam Science Park 904 1098 XH Amsterdam June 28 th , 2013

Upload: others

Post on 12-Sep-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Finding Anomalies in Building Management

Finding Anomalies in Building

Management Data using Clustering

Algorithms

Jasper van Enk

6150519

Bachelor thesis

Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie

University of Amsterdam

Faculty of Science

Science Park 904

1098 XH Amsterdam

Supervisor

Prof. dr. M. van Someren

Informatics Institute

Faculty of Science

University of Amsterdam

Science Park 904

1098 XH Amsterdam

June 28th, 2013

Page 2: Finding Anomalies in Building Management

Summary How well can a computer system learn to find anomalies in building management data and circumstantial

data? Circumstantial data include sensor measures and room schedules. Difficulties in time representation

make it impossible to cluster a small data set in a meaningful way. This leaves the option of assuming the

existence of one cluster. Results show that using multiple clusters results in meaningless or

uninterpretable clusters. The best results are obtained by assuming there is one cluster. A Gaussian

distribution is applied to the cluster to assign probabilities to examples. The right threshold can separate

the normal examples from the anomalies. This standard machine learning approach is a sufficient method

for determining whether an example is anomalous.

Page 3: Finding Anomalies in Building Management

Contents Summary..................................................................................................................................................... 1

Acknowledgement ...................................................................................................................................... 3

Introduction ................................................................................................................................................ 4

Related work ............................................................................................................................................... 5

Research Method ........................................................................................................................................ 6

What is an Anomaly? .............................................................................................................................. 6

Data ........................................................................................................................................................ 6

Algorithm................................................................................................................................................ 7

Feature selection ................................................................................................................................. 7

K-means versus manual clustering ...................................................................................................... 8

Gaussian distribution ........................................................................................................................ 10

Results ...................................................................................................................................................... 11

Clustering outcomes ............................................................................................................................. 11

Gaussian distribution outcomes ............................................................................................................ 12

Determining an anomaly threshold ....................................................................................................... 13

Detected Anomalies .............................................................................................................................. 13

Conclusion ................................................................................................................................................ 14

Discussion ................................................................................................................................................. 14

Bibliography ............................................................................................................................................. 15

Appendix A............................................................................................................................................... 16

Appendix B ................................................................................................................................................ 17

Appendix C ................................................................................................................................................ 18

Page 4: Finding Anomalies in Building Management

Acknowledgement The author wishes to thank several people. I would like to thank my partner, Nadia, for the love, kindness

and support she has shown during the past three months it has taken me to finalize this thesis.

Furthermore I would also like to thank my parents for their love and support. I would like to thank Dr.

van Someren as well, for his enthusiastic assistance and guidance with this paper. Last but not least I

would like to thank Muhammed for the very useful sparring sessions we had.

Page 5: Finding Anomalies in Building Management

Introduction Many papers have already been written about machine learning applications in various contexts.

The results of those papers demonstrate the variable quality of machine learning performance. In

this research we endeavor to provide a method for machine learning and an evaluation of its

performance in the experimental context of building management.

Building management includes all attempts to manually or automatically manipulate the climate in a

building. Constant automated monitoring of the activity of various contributors to the climate in a room

(e.g. heating systems, cooling systems and human presence) could result in discovering patterns to which

these contributors conform. These patterns will be used to recognize the “normal states” of a room. The

K-means algorithm clustering algorithm will be used to cluster these normal states based on how much

they are alike. Accordingly, anomalous circumstances can be identified as states of which the

corresponding contributors to the climate are not resembling to any of these clustered “normal states”,

thus yielding a system that automatically detects unusual states in a room. This will potentially lead to

automatic detection of energy waste in maintaining a constant climate. This system is required because it

is difficult to draw information from building management data manually.

Building management involves a huge amount of data since each room, or set of rooms should be

controlled separately in order to be able to maintain a constant climate in each room. These sets of data

are difficult to overview, which makes them not at all transparent. Moreover, it is difficult to keep up with

the rate at which these data are produced manually since each (set of) room(s) have measure point

intervals of ten minutes. At last it is very time consuming to go through large amounts of data manually to

find anomalous occurrences, while computers are able to perform analyses on the same data in much less

time.

Therefore this paper describes a standard approach to unsupervised learning applied in a new context. This

standard approach consists of a clustering step and a Gaussian distribution application on each individual

cluster. The role of the clustering and its behavior in the building management data will be presented as

well as the anomaly detection capability of the addition of the Gaussian distribution. The aim is to

transform building management data into building management information, thus contributing to more

efficient use of energy. This is important because reducing energy waste has great economic and

environmental value.

The results of numerous studies suggest that the environment suffers significantly from the huge amount

of energy that is consumed every day to keep a buildings’ climate constant. Although no direct correlation

is provided in this paper, the proposed application of machine learning in building management has the

potential to contribute significantly to the reduction of energy consumption and therefore the reduction of

environmental hazards in the future. Furthermore, reduction of energy consumption implies a reduction in

costs and can therefore contribute to a solution for the extreme budget cuts business and governments are

having to implement today.

Page 6: Finding Anomalies in Building Management

Related work The major part of the research question is about the performance of a learning system. A less obvious part

of the question however, is what framework this learning system will require to be able to perform. The

purpose of this research is to apply a learning system to a dataset, but also to provide a clear and solid

theoretical framework for application of learning systems in building management, yielding it more

efficient and effective.

Anomaly detection has been proposed in machine learning as an unsupervised learning problem in several

contexts presenting methods like the least squares solution (Mal’kov & Tunitskii, 2008) and the support

vector machine (Raghuvanshi, Tripathi, & Tiwari, 2011) as solutions. Although these solutions have been

proven to be effective in context, they will presumably not suit the context of building management, for

support vector machines pursue maximization of margins between classes, while least squares solutions

become inaccurate when processing repeated measure data (Ugrinowitsch, Fellingham, & Ricard) which

is not preferably since building management data contain fairly constant measures over large amounts of

time.

The k-means clustering algorithm (Alsabti, Ranka, & Singh, 1997), (Ng, 2013) provides an unsupervised

classification learning model to determine (dynamic) boundaries between undefined classes and can be

used in combination with Gaussian hypotheses. A probability distribution model with multiple Gaussian

hypotheses for determining the location of a robot introduced by (Jensfelt, 2000) will provide the required

basis for determining the probability of an example belonging outside of the predetermined boundaries,

which is in other words the probability of an example being anomalous. Normalization of the features is

required in order to be able to compare them in a meaningful way.

An overview of developments in modelling data as transient data streams expose the limitations of current

machine learning (Gama, 2012) and its handling of large datasets. Possibilities have been explored in the

field of performance enhancement of machine learning by means of data preprocessing (Davis & Clark,

2011). Motivation for examining this aspect of developing machine learning systems comes from the

large impact that data preprocessing has on the accuracy and capability of anomaly detection systems.

Page 7: Finding Anomalies in Building Management

Research Method Anomaly detection is a common application of learning algorithms. This data is produced at a frequent

and constant rate through sensor measurements of different kinds. The machine learning method consists

of a K-means algorithm combined with Gaussian distributions.

What is an Anomaly? For the learning algorithm to be able differentiate between anomaly and normality, first a definition of

anomaly and normality should be provided in the building management context. Concerning buildings

and their rooms in the aspect of climate and climate change, one might consider a normality (normal

conditions) to hold properties like ‘room temperature’, ‘bright (reading) light’, and ‘enough oxygen to

stay focused’. An anomaly should therefore be regarded as a state in which the absolute values of one or

more properties mentioned above are rare, or as a state in which the combination of values from the

properties mentioned is rare (not necessarily meaning that their absolute values should be rare).

In concrete terms, the machine learning algorithm should be able to detect anomalies of two kinds. One

kind involves anomalous measures of a particular feature compared to other measures that same feature

(e.g. normally it is 20oC in a room, but now it is 35oC), and the other kind involves anomalous

combinations of measures of features (e.g. the heating system is on while in the same room the cooling

system is on).

Data The data set required for this research is produced by sensors that are distributed in one room. This is a

lecture room that has 220 seats, and is used only between 9:00 AM and 17:00 PM for lectures. Sensory

measurements have been collected over one month, yielding a data set of 4509 examples each with

thirteen features. Three extra features were added (“Students”, “docNum”, “RoosterNum”) from the same

time span, yielding a data set of 4509 examples each with sixteen features. All features and their units are

presented in “Table 1

”.

Feature Unit Feature Unit

Day

(day of the week)

Day of the Week

(numbers 1-7)

RuimteTemp

degrees Celsius

Month Month of the Year

(numbers 1-12)

Year 12 or 13

setInblaasTemp

degrees Celsius

Klep

Rate at which it is

opened

aanRadTemp

degrees Celsius

radVraagTemp

degrees Celsius

retRadTemp

degrees Celsius

setRadTemp

degrees Celsius

Ruimte CO2

Parts per Million setRuimteTemp

degrees Celsius

Time

24-hour clock Students

Present = 1

Not present = 0

docNum

Number of students

present according to

lecturer

RoosterNum

Number of students

scheduled

Page 8: Finding Anomalies in Building Management

Table 1

Algorithm For unsupervised machine learning the standard and most widely used algorithm is the K-means

algorithm. Since the focus of this paper is on presenting the potential today’s machine learning has in the

building management context, this standard algorithm will be applied. The purpose of the clustering

algorithm is to identify groups of states that have similar features in similar circumstances. Concretely the

clusters will be expected to contain sets of examples that for instance are all produced on weekends

versus weekdays or days versus nights. Accordingly, Gaussian distributions will be placed on the cluster

centroids which leads to a system that flags anomalous examples that have a low probability for both

clusters (e.g. a low probability of being a “day” example and of being a “night” example).

Feature selection

The selection of features that should be analyzed by the machine learning algorithm is a process that has

been carried out manually. The algorithm was used on data sets with different feature compositions,

yielding results that give information about the interdependence of features. An example of this

interdependence is the CO2 concentration in a room, which depends on how many people are present in

that room. One of those features can thus be excluded from the data set. Earlier research has shown the

existence of a correlation between CO2 concentrations and the presence of students in a room. The

purpose of this kind of analysis is to exclude features of which the values depend on other features.

The data available for this research consist of the features displayed in “Table 1

”. A selection of the features is made based on their contribution to the result, their interdependence, and

the significance of the nature of the features (expected significance of “room temperature” is higher than

“klep”). The crossed out features in Table 2, which represents the selection of features made, do not

satisfy these requirements and are therefore not included in the machine learning process.

A special case is the representation of time. Since the development of features over time plays a role in

the process of finding anomalous examples, time requires a representation of some kind in the model.

This is difficult because the values that are needed to distinguish examples over time, should not play a

role in the clustering process since these values by themselves do not carry information that is required

for the anomaly detection process. Therefore the feature time is not included. As will be explained in

more detail later, the clusters will represent time because they will separate “day” and “night” examples.

Although the measurements were done in December 2012 and January 2013, the feature “year” and

“month” do not add significant information to the data since the focus is on room climate and its change.

Although weather conditions do effect the room climate, the weather should be considered more or less

the same over the months December and January and therefore it is not necessary to distinguish between

months and years. The “day” feature is yet significant since weather may vary somewhat from day to day.

Furthermore it is crucial to be able to distinguish days of the week for the weekend days may have

different properties than normal week days.

Since there is a correlation between CO2 concentration and human presence, there is no need for features

like “Students”, “docNum” or “RoosterNum”. The CO2 features is more accurate in the respect of

showing human presence since schedules are subject to change, students might not show up for numerous

reasons and lecturers’ counts can be inaccurate.

Page 9: Finding Anomalies in Building Management

Feature Unit Feature Unit

Day

(day of the week)

Day of the Week

(numbers 1-7)

RuimteTemp

degrees Celsius

Month Month of the Year

(numbers 1-12)

Year 12 or 13

setInblaasTemp

degrees Celsius

Klep

Rate at which it is

opened

aanRadTemp

degrees Celsius

radVraagTemp

degrees Celsius

retRadTemp

degrees Celsius

setRadTemp

degrees Celsius

Ruimte CO2

Parts per Million setRuimteTemp

degrees Celsius

Time

24-hour clock Students

Present = 1

Not present = 0

docNum

Number of students

present according to

lecturer

RoosterNum

Number of students

scheduled

Table 2

Feature normalization

When the features to be used are selected, they are required to be normalized in order to be able to

compare the features in any meaningful way. This normalization is done by subtracting each feature value

with the mean of the feature and then divide it by the standard deviation of the feature.

K-means versus manual clustering

The K-means algorithm is a clustering algorithm that minimizes a cost function J as is explained in (Ng,

2013). This cost function is a least squares solution of the distance from each data point to the center of

the cluster (centroid) that it is assigned to and will be used to compare the results of the K-means

algorithm using different numbers of clusters (different values for K). The center points for each cluster

need a pre-defined starting point to move from in the first iteration of the algorithm. These starting points

are randomly picked examples from the data set. In the first iteration, each cluster centroid is equal to a

random example from the data set to decrease the probability of a centroid ending up at the “edge” of the

field containing zero examples. A known property of the K-means algorithm is that its outcomes are

somewhat depending on the starting position of the centroids. To overcome this, multiple performances of

the algorithm are recorded, each of them ending when the cost function does not decrease further with

each iteration of centroid movement.

Choosing the number of clusters in K-means

To be able to draw conclusions from the output of the machine learning algorithm, a validation of the

proper working of this algorithm is required. In the K-means algorithm the number of clusters over which

the data should be divided has to be defined in advance. Choosing the number of clusters is usually done

manually, although some methods of analysis exist to assist in choosing the “right” number. This is an

important choice because too few clusters can lead to under fitting, while to many clusters can lead to

over fitting the data.

Page 10: Finding Anomalies in Building Management

The elbow method is one of these methods of analysis that suggests to choose the number of clusters that

forms the elbow in a graph that plots the number of clusters against the mean cost. While the number of

clusters increases, the cost function J displayed in equation 1 will decrease. While over de initial clusters

the cost will decrease rapidly, there might come a point at which the decrease of cost function J slows

down significantly. From that point on, increasing the number of clusters will only cause the cost to

decrease slightly.

Equation 1

In figure 1 a slight “elbow angle” is observable at number of clusters = 4. This provides sufficient

motivation to define the K-means algorithm as a K=4 nearest neighbor algorithm since it means that the

least number of clusters necessary to significantly reduce the mean cost is two.

Figure 1

Although the use of K-means seems preferable, the results will show that it does not handle the required

representation of time well. Moreover, the K-means algorithm yields clusters of which the origin is not

interpretable. This is why clusters are composed manually by separating examples into time frames of

night and day. Night is equal to the range of 23:00 – 5:00 hours and day equals the range of 6:00 - 23:00

hours.

Analyzing the probability distributions of different clusters, one must come to the conclusion that no

meaningful clusters can be found in the data set. Therefore the model must assume that only one cluster is

present, over which a Gaussian distribution needs to be determined.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 1 2 3 4 5 6 7 8 9

cost

cluster number

Cost vs Cluster number

Page 11: Finding Anomalies in Building Management

Gaussian distribution

When the amount of clusters the data should be divided over is determined, either by K-means or

manually, a Gaussian distribution is computed in order to determine the probability of an example

belonging to the cluster that the example is assigned to.

The formula for computing vector µ which is displayed in equation 2 is therefore incorporated in the K-

means algorithm. In this case, µ and σ are 9 dimensional vectors since a Gaussian distribution is required

for each individual feature. In equations 2, 3 and 4 the jth feature value belonging to the ith example xj(i) is

presented in iterations with the, to the jth feature corresponding, Gaussian values µj and σj .

Sigma squared is the variance vector that corresponds to the µ vector. When the centroids of the clusters

are determined σ squared is computed using the formula from equation 2.

The probability of an example belonging to the cluster that it is assigned to is computed using µ and σ. It

consists of the product of the probability of each feature from the example belonging to the cluster, as is

shown in equation 3.

Equation 4

Equation 2

Equation 3

Page 12: Finding Anomalies in Building Management

Results The results obtained from various experiments will demonstrate the method’s validity and its anomaly

detection capability from the data.

Clustering outcomes Using four clusters, the examples are divided over the clusters as displayed in figure 2. As can be

observed from figure 2, each cluster contains a set of combinations of features that occur. The clusters

that result from applying the K(4)-means algorithm are difficult to interpret. No clear basis for the

clustering of the examples can be distinguished. In figure 2 the cluster line represents the cluster in which

each example is assigned. As the cluster line jumps from value to value in the range of 1 – 4, the ‘day’

line shows the value of the feature ‘day’ of each example. This enables one to analyze the clustering of

days (in this case small parts of days). While observing the data set, it remains unclear as to why these

four clusters are formed in the way that they are.

Figure 2

As mentioned in the method section, K-means cannot approach a sufficient representation of time, while

this is required for successfully identifying anomalies in the data. Therefore manual clustering is the next

option, since this provides the opportunity to pick a sensible means of clustering. Manual clustering will

be attempted in order to obtain clusters which separate examples into night and day examples, as

mentioned in the method section.

0.00

2.00

4.00

6.00

8.00

1

11

72

33

34

94

65

58

16

97

81

39

29

10

451

161

12

771

393

15

091

625

17

411

857

19

732

089

22

052

321

24

372

553

26

692

785

29

013

017

31

333

249

33

653

481

35

973

713

38

293

945

40

614

177

42

934

409clu

ster

/day

nu

mb

er

example number

Example distribution over clusters

'day' cluster

Page 13: Finding Anomalies in Building Management

Gaussian distribution outcomes Gaussian distributions are computed for each feature of each cluster and accordingly the probability of

each example is computed for belonging to each cluster (e.g. the probability of example X belonging in

the “night” cluster and the probability of example X belonging to the “day” cluster).

Figure 3

In figure 3 each example’s probability for belonging to the ‘day’ cluster is displayed. One should expect

to be able to clearly observe which examples do not resemble the properties of ‘day’ examples (e.g.

examples that are measured in the night) but this is not the case. Comparing figures 3 and 4, one must

conclude that all examples are more likely to be ‘day’ examples than ‘night’ examples. Since not all

examples are actual ‘day’ examples, the properties of ‘day’ examples and ‘night’ examples must be

assumed to be too similar to yield proper ‘day’ and ‘night’ clusters.

Figure 4

This means that no change occurs in the building management system between a ‘day’ and ‘night’

schedule, which is remarkable since the efforts for keeping a room at room temperature should decrease

as the day ends, and increase when the day starts in order to maintain an efficient energy consumption

rate.

0.000

0.010

0.020

0.030

0.040

0.050

11

27

25

33

79

50

56

31

75

78

83

10

091

135

12

611

387

15

131

639

17

651

891

20

172

143

22

692

395

25

212

647

27

732

899

30

253

151

32

773

403

35

293

655

37

813

907

40

334

159

42

854

411

pro

bab

ility

example number

P(day)

0.0000000000

0.0000050000

0.0000100000

0.0000150000

0.0000200000

0.0000250000

0.0000300000

11

38

27

54

12

54

96

86

82

39

60

10

971

234

13

711

508

16

451

782

19

192

056

21

932

330

24

672

604

27

412

878

30

153

152

32

893

426

35

633

700

38

373

974

41

114

248

43

85

pro

bab

ility

example number

P(night)

Page 14: Finding Anomalies in Building Management

Since results show that clustering the data leads to meaningless clusters though K-means as well as when

done manually, the best fitting model for anomaly detection through clustering is the application of one

cluster, of which a probability distribution is shown in figure 5.

Figure 5

Determining an anomaly threshold Since no annotated data exist, the required threshold for determining whether or not an example is

anomalous is determined manually. Various thresholds have been used, flagging different sets of

examples as anomalous. The sets of examples flagged as anomalies are analyzed in order to determine

which threshold value results in the set containing the least noise. When the threshold value is

determined, it will remain set. Experiments then continue in order to identify the range of anomalies that

the system is able to detect. Anomalies of different kinds that can be presumed to be encountered in

practice have been planted across the data set.

Detected Anomalies Anomalous examples can be easily identified using the determined threshold. Examples that have been

altered in order to check if they would be identified as anomalous are spotted immediately. The

anomalies, which are of varying nature, have been assigned probabilities that approach zero. Anomalous

examples and some normal examples are included in Appendix B (normal) and Appendix C (anomalies)

for comparison.

0.00E+00

5.00E-02

1.00E-01

1.50E-01

2.00E-01

2.50E-01

3.00E-01

11

48

29

54

42

58

97

36

88

31

030

11

771

324

14

711

618

17

651

912

20

592

206

23

532

500

26

472

794

29

413

088

32

353

382

35

293

676

38

233

970

41

174

264

44

11

pro

bab

ility

example number

Probability distribution

Page 15: Finding Anomalies in Building Management

Conclusion Individual anomalous features are identified by a standard Gaussian distribution. Results show that the

use of clusters does not add significant improvement of the intelligence of the system. Both automated

clustering (K-means) and manual clustering result in clusters that possess sets of examples that resemble

examples in other cluster too much when more than one cluster is formed.

A Gaussian distribution applied to one cluster is an effective method for identifying anomalous feature

values in this data, which has clear practical value for example in the identification of failing sensors.

Discussion Clustering building management data in a meaningful way proves to be difficult. Time representation is

still a problem, although larger sets of data containing bigger time spans (e.g. a year) and data on more

rooms could help solve this problem since it will be easier to create clusters that represent this larger time

spans and anomalies that occur on a regular basis in one room can be identified by comparing multiple

rooms. This is because larger time spans are more likely to vary (e.g. summer measures are different from

winter measures due to weather influence). Moreover, schedules of room occupation will vary, which is

why a bigger timespan is required to make sure anomalies that are found in the system match the ones

that should be found in practice (e.g. the dataset might be too characteristic for a period of time when

there is no intensive use of rooms).

Page 16: Finding Anomalies in Building Management

Bibliography Alsabti, K., Ranka, S., & Singh, V. (1997). An efficient k-means clustering algorithm. Electrical Engineering

and Computer Science, 43.

Davis, J. J., & Clark, A. (2011). Data preprocessing for anomaly based network intrusion detection: A

review.

Dowe, Y. A. (2003). Unsupervised Learning of Correlated Multivariate Gaussian Mixture Models Using

MML. Computer Science & Software Eng.

Gama, J. (2012). A survey on learning from data streams: current and future trends. Prog Artif Intell, 1,

45–55.

Jensfelt, D. J. (2000). Using Multiple Gaussian Hypotheses to Represent Probability Distributions for

Mobile Robot Localization. International Conference on Robotics & Automation.

Mal’kov, K., & Tunitskii, D. (2008). On One Extremal Problem of Adaptive Machine. Automation and

Remote Control, 69, 942–952.

Ng, A. (2013, 6 23). Machine Learning. Retrieved from Coursera, Stanford:

https://www.coursera.org/course/ml?from_restricted_preview=1&course_id=16&r=https%3A%

2F%2Fclass.coursera.org%2Fml%2Fauth%2Fauth_redirector%3Ftype%3Dlogin%26subtype%3Dn

ormal%26visiting%3Dhttps%253A%252F%252Fclass.coursera.org%252Fml%252Flecture%252Fin

dex

Raghuvanshi, A., Tripathi, R., & Tiwari, S. (2011). Machine learning approach for anomaly detection in

wireless sensor data. International Journal of Advances in Engineering & Technology, Sept.

Ugrinowitsch, C., Fellingham, G., & Ricard, M. (n.d.). Limitations of ordinary least squares models in

analyzing repeated measures data. Medicine & Science in Sports & Exercise, 2144-2148.

Page 17: Finding Anomalies in Building Management

Appendix A

K-means pseudo code

READ examples (list of arrays) <- data

INIT centroid X, Y…..k <- random example

CHECK centroids are not equal

For all examples

DETERMINE cost to each centroid

IF cost to centroid X is smallest

ASSIGN example <- tag X

END For

While minimizing mean cost

For all centroids

For all examples

Check example tag X

ADD examples with tag X

END For

centroid X <- mean of examples with tag X

END For

For all examples

DETERMINE cost to each centroid X

IF cost to centroid X is smallest

ASSIGN example <- tag X

END For

COMPUTE mean cost of examples to corresponding centroid X

END While

Gaussian distribution pseudo code

COMPUTE mean µ for each feature j of each example in cluster X

For each example Y from cluster X

For each feature j

σj2 = σj

2 + (Yj - µj)2

END For

END For

For each feature j

σj2 = 1/number of examples * σj

2

END For

For each feature j of example Y

P(Y) = P(Y) * P(Yj ; µj ; σj2)

END For

Page 18: Finding Anomalies in Building Management

Appendix B 205478600000000.

00

273018700000000.

00

280000000000000.

00

233851400000000.

00

748944100000000.

00

332126200000000.

00

0.00

312142900000000.

00

426388900000000.

00 205478600000000.

00

273018700000000.

00

280000000000000.

00

233851400000000.

00

736122400000000.

00

332126200000000.

00

0.00

313205400000000.

00

426154400000000.

00 205478600000000.

00

277678700000000.

00

280000000000000.

00

233851400000000.

00

722435500000000.

00

332126200000000.

00

0.00

312961500000000.

00

425190600000000.

00 205478600000000.

00

272987000000000.

00

280000000000000.

00

233851400000000.

00

709802800000000.

00

331102300000000.

00

0.00

316406300000000.

00

428616600000000.

00 205478600000000.

00

272987000000000.

00

280000000000000.

00

233851400000000.

00

692881200000000.

00

331102300000000.

00

0.00

312411400000000.

00

430805600000000.

00 205478600000000.

00

277741800000000.

00

280000000000000.

00

233851400000000.

00

672081000000000.

00

330052000000000.

00

0.00

315936700000000.

00

430759000000000.

00 205478600000000.

00

278567000000000.

00

280000000000000.

00

233851400000000.

00

650455900000000.

00

329035600000000.

00

0.00

327452300000000.

00

430957000000000.

00 205478600000000.

00

269962200000000.

00

280000000000000.

00

233851400000000.

00

633025400000000.

00

329006700000000.

00

0.00

325958000000000.

00

432291100000000.

00 205478600000000.

00

275422200000000.

00

280000000000000.

00

233851400000000.

00

620125900000000.

00

330022000000000.

00

0.00

326775600000000.

00

433298100000000.

00 205478600000000.

00

275303100000000.

00

280000000000000.

00

233851400000000.

00

608625600000000.

00

329027900000000.

00

0.00

326996300000000.

00

433263000000000.

00 205478600000000.

00

274543700000000.

00

280000000000000.

00

233851400000000.

00

606180800000000.

00

330040400000000.

00

0.00

327944300000000.

00

437629100000000.

00 205478600000000.

00

274914100000000.

00

280000000000000.

00

233851400000000.

00

595261800000000.

00

332037100000000.

00

0.00

331251900000000.

00

442304300000000.

00 205478600000000.

00

279318100000000.

00

280000000000000.

00

233851400000000.

00

590711400000000.

00

333040200000000.

00

0.00

328288000000000.

00

442107300000000.

00 205478600000000.

00

274593000000000.

00

280000000000000.

00

233851400000000.

00

594244900000000.

00

333040200000000.

00

0.00

332857700000000.

00

441513900000000.

00 205478600000000.

00

270190500000000.

00

280000000000000.

00

233851400000000.

00

603197800000000.

00

334047900000000.

00

0.00

330891200000000.

00

440389300000000.

00

Page 19: Finding Anomalies in Building Management

Appendix C 200807100000000.

00

239845200000000.

00

280000000000000.

00

233851400000000.

00

829555000000000.

00

362405700000000.

00

0.00

356748800000000.

00

518023100000000.

00 296246500000000.

00

237955300000000.

00

280000000000000.

00

233851400000000.

00

530223400000000.

00

365449800000000.

00

0.00

343981000000000.

00

529554700000000.

00 199290400000000.

00

959610500000000.

00

280000000000000.

00

233851400000000.

00

856821100000000.

00

356356600000000.

00

0.00

317542900000000.

00

416345300000000.

00 353537700000000.

00

277424100000000.

00

280000000000000.

00

233851400000000.

00

619899000000000.

00 0.00

0.00 0.00 0.00

193537700000000.

00

273254400000000.

00

280000000000000.

00

233851400000000.

00

569114600000000.

00 0.00

0.00 0.00 0.00

193537700000000.

00

273956700000000.

00

280000000000000.

00

233851400000000.

00

542474900000000.

00 0.00

0.00 0.00 0.00

193537700000000.

00

273956700000000.

00

280000000000000.

00

233851400000000.

00

579711000000000.

00 0.00

0.00 0.00 0.00

193537700000000.

00

269794600000000.

00

280000000000000.

00

233851400000000.

00

607864500000000.

00 0.00

0.00 0.00 0.00

193537700000000.

00

269794600000000.

00

280000000000000.

00

233851400000000.

00

587268900000000.

00 0.00

0.00 0.00 0.00

193537700000000.

00

269243400000000.

00

280000000000000.

00

233851400000000.

00

595976200000000.

00 0.00

0.00 0.00 0.00

193537700000000.

00

872942000000000.

00

280000000000000.

00

233851400000000.

00

624424400000000.

00 0.00

0.00 0.00 0.00