anomaly detection - new york machine learning

85
© 2014 MapR Technologies 1 © MapR Technologies, confidential How to Find What You Didn’t Know to Look For Anomaly Detection October 14, 2014

Upload: ted-dunning

Post on 27-Nov-2014

542 views

Category:

Technology


1 download

DESCRIPTION

Anomaly detection is the art of finding what you don't know how to ask for. In this talk, I walk through the why and how of building probabilistic models for a variety of problems including continuous signals and web traffic. This talk blends theory and practice in a highly approachable way.

TRANSCRIPT

Page 1: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 1

© MapR Technologies, confidential

How to Find What You Didn’t Know to Look For

Anomaly Detection

October 14, 2014

Page 2: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 2

Anomaly Detection: How To Find What You Didn’t Know to Look For

Ted Dunning, Chief Applications Architect MapR Technologies

Email [email protected] [email protected]

Twitter @Ted_Dunning

Ellen Friedman, Consultant and Commentator

Email [email protected]

Twitter @Ellen_Friedman

Page 3: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 3

e-book available courtesy of MapR

http://bit.ly/1jQ9QuL

A New Look at Anomaly Detectionby Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)

Page 4: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 4

Practical Machine Learning series (O’Reilly)

• Machine learning is becoming mainstream• Need pragmatic approaches that take into account real world

business settings:– Time to value– Limited resources– Availability of data– Expertise and cost of team to develop and to maintain system

• Look for approaches with big benefits for the effort expended

Page 5: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 5

Anomaly Detection

Page 6: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 6

Who Needs Anomaly Detection?

Utility providers using smart meters

Page 7: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 7

Who Needs Anomaly Detection?

Feedback from manufacturing assembly lines

Page 8: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 8

Who Needs Anomaly Detection?

Monitoring data traffic on communication networks

Page 9: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 9

What is Anomaly Detection?

• The goal is to discover rare events – especially those that shouldn’t have happened

• Find a problem before other people see it– especially before it causes a problem for customers

• Why is this a challenge?– I don’t know what an anomaly looks like (yet)

Page 10: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 10

Spot the Anomaly

Page 11: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 11

Spot the Anomaly

Looks pretty anomalous

to me

Page 12: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 12

Spot the Anomaly

Will the real anomaly please stand up?

Page 13: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 13

Basic idea:Find “normal” first

Page 14: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 14

Steps in Anomaly Detection

• Build a model: Collect and process data for training a model• Use the machine learning model to determine what is the normal

pattern • Decide how far away from this normal pattern you’ll consider to

be anomalous• Use the AD model to detect anomalies in new data

– Methods such as clustering for discovery can be helpful

Page 15: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 15

How hard is it to set an alert for anomalies?

Grey data is from normal events; x’s are anomalies.Where would you set the threshold?

Page 16: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 16

Basic idea:Set adaptive thresholds

Page 17: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 17

What Are We Really Doing

• We want action when something breaks (dies/falls over/otherwise gets in trouble)

• But action is expensive• So we don’t want too many false alarms• And we don’t want too many false negatives

• What’s the right threshold to set for alerts?– We need to trade off costs

Page 18: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 18

A Second Look

Page 19: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 19

A Second Look

99.9%-ile

Page 20: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 20

New algorithm: t-digest

Page 21: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 21

Online Summarizer

99.9%-ile

t

x > t ? Alarm !x

How Hard Can it Be?

Page 22: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 22

Detecting Anomalies in Sporadic Events

Page 23: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 23

Using t-Digest

• Apache Mahout uses t-digest as an on-line percentile estimator– very high accuracy for extreme tails– new in version Mahout v 0.9

• t-digest also available elsewhere– in streamlib (open source library on github)– standalone (github and Maven Central)

• What’s the big deal with anomaly detection?

• This looks like a solved problem

Page 24: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 24

Already Done? Etsy Skyline?

Page 25: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 25

What About This?

Page 26: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 26

Model Delta Anomaly Detection

Online Summarizer

δ > t ?

99.9%-ile

t

Alarm !

Model

-

+ δ

Page 27: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 27

The Real Inside Scoop

• The model-delta anomaly detector is really just a sum of random variables– the model we know about already– and a normally distributed error

• The output (delta) is (roughly) the log probability of the sum distribution (really δ2)

• Thinking about probability distributions is good

• But how do you handle AD in systems with sporadic events?

Page 28: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 28

Spot the Anomaly

Anomaly?

Page 29: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 29

Maybe not!

Page 30: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 30

Where’s Waldo?

This is the real anomaly

Page 31: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 31

Normal Isn’t Just Normal

• What we want is a model of what is normal

• What doesn’t fit the model is the anomaly

• For simple signals, the model can be simple …

• The real world is rarely so accommodating

Page 32: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 32

We Do Windows

Page 33: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 33

We Do Windows

Page 34: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 34

We Do Windows

Page 35: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 35

We Do Windows

Page 36: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 36

We Do Windows

Page 37: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 37

We Do Windows

Page 38: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 38

We Do Windows

Page 39: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 39

We Do Windows

Page 40: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 40

We Do Windows

Page 41: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 41

We Do Windows

Page 42: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 42

We Do Windows

Page 43: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 43

We Do Windows

Page 44: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 44

We Do Windows

Page 45: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 45

We Do Windows

Page 46: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 46

We Do Windows

Page 47: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 47

Windows on the World

• The set of windowed signals is a nice model of our original signal• Clustering can find the prototypes

– Fancier techniques available using sparse coding

• The result is a dictionary of shapes• New signals can be encoded by shifting, scaling and adding

shapes from the dictionary

Page 48: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 48

Most Common Shapes (for EKG)

Page 49: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 49

Reconstructed signal

Original signal

Reconstructed signal

Reconstructionerror

< 1 bit / sample

Page 50: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 50

An Anomaly

Original technique for finding 1-d anomaly works against reconstruction error

Page 51: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 51

Close-up of anomaly

Not what you want your heart to do.

And not what the model expects it to do.

Page 52: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 52

A Different Kind of Anomaly

Page 53: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 53

Model Delta Anomaly Detection

Online Summarizer

δ > t ?

99.9%-ile

t

Alarm !

Model

-

+ δ

Page 54: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 54

The Real Inside Scoop

• The model-delta anomaly detector is really just a sum of random variables– the model we know about already– and a normally distributed error

• The output (delta) is (roughly) the log probability of the sum distribution (really δ2)

• Thinking about probability distributions is good

Page 55: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 55

Anomalies among sporadic events

Page 56: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 56

Sporadic Web Traffic to an e-Business Site

It’s important to know if traffic is stopped or delayed because of a problem…

But visits to site normally come at varying intervals.

How long after the last event should you begin to worry?

Page 57: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 57

Sporadic Web Traffic to an e-Business Site

It’s important to know if traffic is stopped or delayed because of a problem…

But visits to site normally come at varying intervals.

And how do you let your CEO sleep through the night?

Page 58: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 58

Basic idea:Time interval between events is how to

convert to something useful you can measure

Page 59: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 59

Sporadic Events: Finding Normal and Anomalous Patterns

• Time between intervals is much more usable than absolute times

• Counts don’t link as directly to probability models

• Time interval is log ρ

• This is a big deal

Page 60: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 60

Event Stream (timing)

• Events of various types arrive at irregular intervals– we can assume Poisson distribution

• The key question is whether frequency has changed relative to expected values– This shows up as a change in interval

• Want alert as soon as possible

Page 61: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 61

Converting Event Times to Anomaly

99.9%-ile

99.99%-ile

Page 62: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 62

But in the real world, event rates often change

Page 63: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 63

Time Intervals Are Key to Modeling Sporadic Events

Page 64: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 64

Model-Scaled Intervals Solve the Problem

Page 65: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 65

Model Delta Anomaly Detection

Online Summarizer

δ > t ?

99.9%-ile

t

Alarm !

Model

-

+ δ

log p

Page 66: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 66

Detecting Anomalies in Sporadic Events

Page 67: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 67

Detecting Anomalies in Sporadic Events

Page 68: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 68

Slipped Week: Simple Rate Predictor

Page 69: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 69

Poisson Distribution

• Time between events is exponentially distributed

• This means that long delays are exponentially rare

• If we know λ we can select a good threshold– or we can pick a threshold empirically

Page 70: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 70

Seasonality Poses a Challenge

Page 71: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 71

Something more is needed …

Page 72: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 72

We need a better rate predictor…

Page 73: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 73

A New Rate Predictor for Sporadic Events

Page 74: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 74

Improved Prediction with Adaptive Modeling

Page 75: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 75

Anomaly Detection + Classification Useful Pair

• Use the AD model to detect anomalies in new data– Methods such as clustering for discovery can be helpful

• Once you have well-defined models in your system, you may also want to use classification to tag those

• Continue to use the AD model to find new anomalies

Page 76: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 76

Recap (out of order)

• Anomaly detection is best done with a probability model• -log p is a good way to convert to anomaly measure• Adaptive quantile estimation (t-digest) works for auto-setting

thresholds

Page 77: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 77

Recap

• Different systems require different models• Continuous time-series

– sparse coding to build signal model

• Events in time– rate model base on variable rate Poisson– segregated rate model

• Events with labels– language modeling– hidden Markov models

Page 78: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 78

Why Use Anomaly Detection?

Page 79: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 79

Keep in mind…

• Model normal, then find anomalies

• t-digest for adaptive threshold

• Probabilistic models for complex patterns

-

Page 80: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 80

Keep in mind…

• Time intervals are key for sporadic events

• Complex time shift to predict rate with seasonality

• Sequence of events reveals phishing attack

Page 81: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 81

e-book available courtesy of MapR

http://bit.ly/1jQ9QuL

A New Look at Anomaly Detectionby Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)

Page 82: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 82

Coming in October: Time Series Databasesby Ted Dunning and Ellen Friedman © Oct 2014 (published by O’Reilly)

Page 83: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 83

Thank you for coming today!

Page 84: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 85

© MapR Technologies, confidential

Page 85: Anomaly Detection - New York Machine Learning

© 2014 MapR Technologies 86

Sandbox