anomaly detection, part 1
TRANSCRIPT
2
Part 1: Anomaly detection – taste of theory and code Statistical techniques
Part 2: Tools
Part 3: Clustering
High-level message: IoE and every Cloud solution produce Big Data. Permanent focus on utilization of this Big Data allows new features and even new products to be developed. Having expertise, we can choose between adopting, collaborating, buying or developing.
Agenda
3
Use Case: a computer fan in one of your servers is not working
Features to help: 1) CPU load 2) Temperature sensor
Motivation Example: detect failing servers on a network.
0 0.2 0.4 0.6 0.8 130
40
60
80
100
x1 (CPU load)
x 2 (
Tem
p, 0
C)
combination of features help reveal anomaly
4
Manual process: 1. ask expert and define the rule: if(cpuLoad < thr1 && Tempsensor>thr2 ) ->
Anomaly
2. implementation: requires rules language. Or let’s just hardcode it for now!
Fundamental problems:- Not scalable: in use cases, in rules, in features, in hardware- Very static, not adaptable. Example: fault positives in case we
decide to optimize energy efficiency of our Data Center - A posteriori knowledge, delays in months/years
Motivation Example: detect failing servers on a network.
0 0.2 0.4 0.6 0.8 130
40
60
80
100
x1 (CPU load)
x 2 (
Tem
p, 0
C)
The manual ruleTe
mp senso
r?
5
Vision (still doesn’t exist): Universal Scalable Real-time/offline, pluggable, …
In the next slides: Mathematical intro to universal, scalable solutions. With limitations.
Why now? Switch to Big Data/Cloud. New challenges. Easy to see benefits – many others (Google, FB…) use Anomaly Detection. New features for our products.
Ideal Anomaly Detection for your domain
7
Dataset: Approach: given the unlabeled training set, build a model for .
Say . If is a distributed Gaussian with mean , variance .
Gaussian (Normal) distribution
2
2
2
2
2
2
1),;(
x
exp
-2 -1 0 1 2 3 40
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4Gauss Distribution
= 1, = 1 2,~ Νx
xp
9
Parameter estimation
Dataset:
-10
0
10
-4
-2
0
2
40
0.05
0.1
0.15
xy
p(.)
m
i
ixm 1
1
2
1
2 1
m
i
ixm
11
Density estimationTraining set:Each example is
2
2222
2111
,~
....
,~
,~
nnn Νx
Νx
Νx
),;(...),;(),;()( 22222
2111 nnnxpxpxpp x
n
iiiixpp
1
2 ),;()( x
12
Anomaly detection algorithm
1. Choose features that you think might be indicative of anomalous examples.
2. Fit parameters
3. Given new example , compute :
Anomaly if
n
15
When developing a learning algorithm (choosing features, etc.), making decisions is much easier if we have a way of evaluating our learning algorithm.
The importance of real-number evaluation
Assume we have some labeled data, of anomalous (0-50) and non-anomalous examples (~100-10,000). ( if normal, if anomalous).
Training set: (assume normal examples/not anomalous) – 60% of the data
Cross validation set: 20%+50% of anom.
Test set: 20%+50% of anomalies
16
Fit model on training setOn a cross validation/test example , predict
Algorithm evaluation
Possible evaluation metrics:- True positive, false positive, false negative, true negative- Precision/Recall- F1-score
Can also use cross validation set to choose parameter
fntp
tprec
fptp
tpprec
recprec
recprecF
;;2
1
19
Monitoring computers in data centerChoose features that might take on unusually large or small values in the event of an anomaly.
= memory use of computer= number of disk accesses/sec= CPU load= network traffic
trafficnetwork
loadCPUx 5
loadCPU
etemperaturx 6
21
Motivating example: Monitoring machines in a data center
(CPU Load)
(CPU Load)
(Memory Use)
(Mem
ory
Use
)
22
Multivariate Gaussian (Normal) distribution . Don’t model etc. separately.
Model all in one go.
Parameters: (covariance matrix)
31
2. Given a new example , compute
Flag an anomaly if
Anomaly detection with the multivariate Gaussian1. Fit model by setting
32
Relationship to original modelOriginal model:
Corresponds to multivariate Gaussian
where
2
22
21
0...0
............
000
0...0
n
33
Original model Multivariate Gaussianvs.
Manually create features to capture anomalies where take unusual combinations of values.
Automatically captures correlations between features
Computationally cheaper (alternatively, scales better to large )
Computationally more expensive
OK even if (training set size) is small Must have , or else is non-invertible.
34
Anomaly detection – taste of theory and code Statistical techniques Clustering: K-means algorithm PCA Neural Network Practical tips: missing values, SW libraries, …
Work with textual data, similarity techniques
Tools
Break
36
Prof. Andrew Ng. “Machine Learning”, Coursera
Credits and Learning Materials