anomaly detection, part 1

36
1 Anomaly detection – part 1 David Khosid Jan. 14, 2015

Upload: david-khosid

Post on 18-Aug-2015

240 views

Category:

Data & Analytics


0 download

TRANSCRIPT

1

Anomaly detection – part 1

David KhosidJan. 14, 2015

2

Part 1: Anomaly detection – taste of theory and code Statistical techniques

Part 2: Tools

Part 3: Clustering

High-level message: IoE and every Cloud solution produce Big Data. Permanent focus on utilization of this Big Data allows new features and even new products to be developed. Having expertise, we can choose between adopting, collaborating, buying or developing.

Agenda

3

Use Case: a computer fan in one of your servers is not working

Features to help: 1) CPU load 2) Temperature sensor

Motivation Example: detect failing servers on a network.

0 0.2 0.4 0.6 0.8 130

40

60

80

100

x1 (CPU load)

x 2 (

Tem

p, 0

C)

combination of features help reveal anomaly

4

Manual process: 1. ask expert and define the rule: if(cpuLoad < thr1 && Tempsensor>thr2 ) ->

Anomaly

2. implementation: requires rules language. Or let’s just hardcode it for now!

Fundamental problems:- Not scalable: in use cases, in rules, in features, in hardware- Very static, not adaptable. Example: fault positives in case we

decide to optimize energy efficiency of our Data Center - A posteriori knowledge, delays in months/years

Motivation Example: detect failing servers on a network.

0 0.2 0.4 0.6 0.8 130

40

60

80

100

x1 (CPU load)

x 2 (

Tem

p, 0

C)

The manual ruleTe

mp senso

r?

5

Vision (still doesn’t exist): Universal Scalable Real-time/offline, pluggable, …

In the next slides: Mathematical intro to universal, scalable solutions. With limitations.

Why now? Switch to Big Data/Cloud. New challenges. Easy to see benefits – many others (Google, FB…) use Anomaly Detection. New features for our products.

Ideal Anomaly Detection for your domain

6

Anomaly detection

Machine Learning

Theory (adopted from Prof. Andrew Ng and Coursera)

7

Dataset: Approach: given the unlabeled training set, build a model for .

Say . If is a distributed Gaussian with mean , variance .

Gaussian (Normal) distribution

2

2

2

2

2

2

1),;(

x

exp

-2 -1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4Gauss Distribution

= 1, = 1 2,~ Νx

xp

8

Gaussian distribution example

9

Parameter estimation

Dataset:

-10

0

10

-4

-2

0

2

40

0.05

0.1

0.15

xy

p(.)

m

i

ixm 1

1

2

1

2 1

m

i

ixm

10

Anomaly detection

Algorithm

Machine Learning

11

Density estimationTraining set:Each example is

2

2222

2111

,~

....

,~

,~

nnn Νx

Νx

Νx

),;(...),;(),;()( 22222

2111 nnnxpxpxpp x

n

iiiixpp

1

2 ),;()( x

12

Anomaly detection algorithm

1. Choose features that you think might be indicative of anomalous examples.

2. Fit parameters

3. Given new example , compute :

Anomaly if

n

13

Anomaly detection example

14

Anomaly detection

Developing and evaluating an anomaly detection system

Machine Learning

15

When developing a learning algorithm (choosing features, etc.), making decisions is much easier if we have a way of evaluating our learning algorithm.

The importance of real-number evaluation

Assume we have some labeled data, of anomalous (0-50) and non-anomalous examples (~100-10,000). ( if normal, if anomalous).

Training set: (assume normal examples/not anomalous) – 60% of the data

Cross validation set: 20%+50% of anom.

Test set: 20%+50% of anomalies

16

Fit model on training setOn a cross validation/test example , predict

Algorithm evaluation

Possible evaluation metrics:- True positive, false positive, false negative, true negative- Precision/Recall- F1-score

Can also use cross validation set to choose parameter

fntp

tprec

fptp

tpprec

recprec

recprecF

;;2

1

17

Anomaly detection

Choosing what features to use

Machine Learning

18

Non-gaussian features

constx log

),;( 2iiixp

19

Monitoring computers in data centerChoose features that might take on unusually large or small values in the event of an anomaly.

= memory use of computer= number of disk accesses/sec= CPU load= network traffic

trafficnetwork

loadCPUx 5

loadCPU

etemperaturx 6

20

Anomaly detectionMultivariate Gaussian distribution

Machine Learning

21

Motivating example: Monitoring machines in a data center

(CPU Load)

(CPU Load)

(Memory Use)

(Mem

ory

Use

)

22

Multivariate Gaussian (Normal) distribution . Don’t model etc. separately.

Model all in one go.

Parameters: (covariance matrix)

23

Multivariate Gaussian (Normal) examples

24

Multivariate Gaussian (Normal) examples

25

Multivariate Gaussian (Normal) examples

26

Multivariate Gaussian (Normal) examples

27

Multivariate Gaussian (Normal) examples

28

Multivariate Gaussian (Normal) examples

29

Anomaly detectionAnomaly detection using the multivariate Gaussian distribution

Machine Learning

30

Multivariate Gaussian (Normal) distributionParameters

Parameter fitting:Given training set

31

2. Given a new example , compute

Flag an anomaly if

Anomaly detection with the multivariate Gaussian1. Fit model by setting

32

Relationship to original modelOriginal model:

Corresponds to multivariate Gaussian

where

2

22

21

0...0

............

000

0...0

n

33

Original model Multivariate Gaussianvs.

Manually create features to capture anomalies where take unusual combinations of values.

Automatically captures correlations between features

Computationally cheaper (alternatively, scales better to large )

Computationally more expensive

OK even if (training set size) is small Must have , or else is non-invertible.

34

Anomaly detection – taste of theory and code Statistical techniques Clustering: K-means algorithm PCA Neural Network Practical tips: missing values, SW libraries, …

Work with textual data, similarity techniques

Tools

Break

35

Cost?

36

Prof. Andrew Ng. “Machine Learning”, Coursera

Credits and Learning Materials