anomaly detection in deep learning (updated) english

18
Anomaly Detection in Deep Learning Adam Gibson - Skymind

Upload: adam-gibson

Post on 15-Apr-2017

2.819 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Anomaly detection in deep learning (Updated) English

Anomaly Detection in Deep Learning

Adam Gibson - Skymind

Page 2: Anomaly detection in deep learning (Updated) English

Deep Learning book

Page 3: Anomaly detection in deep learning (Updated) English

Dl4j

Page 4: Anomaly detection in deep learning (Updated) English

SkymindWe take Deep Learning models to production on

premiseUsing Scala (think Python for production)Java Virtual Machine stack connected to C++ (eg:

first class access to big data systems) with native compute

We make SKIL(Skymind Intelligence Layer): A production deep learning system for building deep learning applications in production

Page 5: Anomaly detection in deep learning (Updated) English

What’s an “Anomaly?”Abnormal Patterns in DataFraud Detection - “Bad credit card Transactions”ALSO Fraud detection - Detecting fake locations with

call detail recordsNetwork Intrusion - Abnormal Activity in a networkBroken Computers in a data center

Page 6: Anomaly detection in deep learning (Updated) English

Brief Case Studies - eg: Why am I up here?Telco:

http://blogs.wsj.com/cio/2016/03/14/orange-tests-deep-learning-software-to-identify-fraud/

Network Infrastructure: https://insights.ubuntu.com/2016/04/25/making-deep-learning-accessible-on-openstack/

Page 7: Anomaly detection in deep learning (Updated) English

Network Infra - Save time and Money avoiding Broken workloads by auto migration before it happens

Page 8: Anomaly detection in deep learning (Updated) English

Why Deep Learning?Learns well from lots of dataOwn feature representation: Robust to noise and

allows for learning cross domain patternsAlready applied in ads: Google itself invests lots in

this same kind of pattern recognition (targeting/relevance)

Page 9: Anomaly detection in deep learning (Updated) English

TechniquesUnsupervised - Use autoencoder reconstruction error and moving

averages with dropout over a set time window

Supervised - RNNs learn from a set of yes/nos in a time series. RNNs can learn from a series of time steps and predict when an anomaly is about to occur.

Use streaming/minibatches (all neural nets can learn like this)

Page 10: Anomaly detection in deep learning (Updated) English

AutoEncoder Anomaly Detection Moving average anomaly with KL Divergence

Autoencoder learns to reconstruct data (eg: the input is the labels)

Page 11: Anomaly detection in deep learning (Updated) English

Recurrent Net AnomaliesLearn a softmax over time series:

Given a fixed window, the goal is to predict a probability of an anomaly

occurring given a sequence

Page 12: Anomaly detection in deep learning (Updated) English

Sequences Time Series/Windows with RNNshttp://karpathy.github.io/2015/05/21/rnn-effectiveness/

See: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Page 13: Anomaly detection in deep learning (Updated) English

Some definitionsReconstruction Error: Autoencoders can learn from

unsupervised pretraining and learn how to reconstruct data. Minimize KL Divergence (the delta between two probability distributions)

RNN/Time Series: See http://deeplearning4j.org/usingrnns

Page 14: Anomaly detection in deep learning (Updated) English

ProductionKafka/Spark Streaming/Flink/ApexNeural networks as consumer of streaming updatesData? Mostly log ingestion, could be video

Page 15: Anomaly detection in deep learning (Updated) English

Demo!Kibana

Kafka

Elasticsearch

Logstash

NiFi

Cassandra

Lagom

Dl4j Ecosystem(DataVec,Nd4j,Dl4j,Arbiter)

Page 16: Anomaly detection in deep learning (Updated) English

Reference Architecture for Anomaly DetectionExternal

World

Ingest from external with nifi Send to

kafkaMake a prediction about the data

Index the prediction in elasticsearch with logstash

Render the data with kibana

Store raw events in cassandra

Page 17: Anomaly detection in deep learning (Updated) English

SummaryReal ML pipelineCassandra for storing raw data resultsELK (Elasticsearch, Logstash, Kibana) stack for

alerting and visualizationKafka for model ingestionLagom for serving model predictionsNiFi for designing data pipelines

Page 18: Anomaly detection in deep learning (Updated) English

Questions?Email: [email protected]

Twitter: agibsonccc

Github: agibsonccc