machine learning - slides.yowconference.com · dnn: feature engineering anything humans can do in...

54
Copyright Cognomotiv 2016 Machine Learning No: It Can’t Do That! Hadi Nahari [email protected] hadinahari

Upload: others

Post on 29-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Machine Learning

No: It Can’t Do That!

Hadi Nahari

[email protected]

hadinahari

Page 2: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

“Friends, Romans, countrymen, lend me your ears;

I come to bury Caesar, not to praise him.

The evil that men do lives after them…”

Julius Caesar

Act 3, Scene II

Page 3: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Setup

• ML + NetSec

Page 4: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

National Academy of EngineeringGrand Challenges for 21st Century

"The best minds of my generation are thinking about how to make people click ads.” ---Jeff Hammerbacher

Page 5: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Agenda

• Motivations

• Machine Learning 101

• ML & Network Security

• What Works, What Doesn’t

• Conclusion

5 / 50

Page 6: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

MOTIVATIONSAgenda

Page 7: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

ML Is NOT New

• This is the 5th round…

Page 8: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

ML is HOT!!

• VCs fund ML-companies like crazy

• Amazing new fields have opened

– Autonomous driving, behavior analytics, etc.

• Ton of existing fields have been revived

– Search, personalization/customization, audio processing, image processing, etc., etc.

Page 9: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

• Mainly because…

Page 10: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Code Complexity

• Space Shuttle: ~400K LOC

• F22 Raptor fighter: ~2M LOC

• Linux kernel 2.2: ~2.5M LOC

• Hubble telescope: ~3M LOC

• Android core: ~12M LOC

• Future Combat Sys.: ~63M LOC

• Connected car: ~100M LOC

• Autonomous vehicle: ~300M LOC

10 / 50

Page 11: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

• Autonomous vehicle: ~300M LOC

Large Hadron Collider: 60 M LOC

50 M LOC

Page 12: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Usecase Complexityservice provider

on avg. only five passwords per 40 online accounts per user

where to store the tokens???

Page 13: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Data Procreation

• >2 billion GB of new data is created every day– 2.3283006436538696 B GB to be exact

• Sparse data: mainly 0s

• In ‘93 the information on the internet surpassed all information that humanity had created before it

Page 14: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Stack Proliferation

HW Architecture(s)

Applications

Page 15: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Algorithms

15 / 50

Page 16: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Algorithms

Page 17: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

ML 101Agenda

Page 18: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Machine Learning (ML)• Study of pattern recognition & computational

learning theory in Artificial Intelligence (AI)

• Algorithms to learn from, and make predictions on data

• As opposed to following strictly static program instructions

Page 19: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

ML Models• Supervised learning

• Unsupervised learning

• (Semi-supervised learning)

• Reinforcement learning

Page 20: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Supervised Learning

20

• {(labeled) Input} [map] {Expected Output}

• Find [map]

/ 50

Page 21: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Supervised Learning Model

Page 22: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Unsupervised Learning• {(unlabled) Input} [map] {Output}

• Find structure (patterns) in {Input}

Page 23: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Unsupervised Learning Model

Page 24: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Reinforcement Learning• No correct {Input}/{Output}

• Action, environment, reward

Page 25: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Reinforcement Learning Model

25 / 50

Page 26: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Main ML Approaches• Decision Tree Learning, Association Rule Learning

• Inductive Logic Programming, Support Vector Machines, Clustering, Bayesian Networks

• Representation Learning, Genetic Algorithms

• Similarity and Metric Learning, Sparse Dictionary Learning

• Artificial Neural Networks (ANN), Deep Learning (DL)

Page 27: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Neural Network

• Interpret an Artificial Intelligence (AI) task as the evaluation of complex functions

– Facial Recognition: Map a bunch of pixels to a name

– Handwriting Recognition: Image to a character

• NN: Network of interconnected simple neurons

Page 28: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

The NeuronFeed-forward system, made up of two stages:

Linear Transformation of data

Point-wise application of non-linear function

X

1

X

2

X

3

W

1

W

2

W

3

yi =F(ΣWiXi)i

F(x) =max(0,x)

(also sigmoid, Rectified Linear Unit (ReLU), etc.)

Page 29: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Artificial Neural Network (ANN)• Layers and layers of neurons, with many

connections

Input:

Output:

Page 30: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Deep Learning (DL)

30

• Branch of ML based on a set of algorithms that:

• Attempt to model high-level data abstractions

• Are based on learning representations of data

• Use complex architectures with multiple non-linear transformations

• Some representations make it easier to learn tasks from examples (e.g. Alpha Go)

/ 50

Page 31: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

DNN: Learning Feature Representation

Input Result

Page 32: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

DNN: Feature Engineering

Anything humans can do in 0.1 sec, the right, big 10-layer network can do too

Image Vision features Detection

Images/video

Audio Audio features Speaker ID

Audio

Text

Text Text features

Text classification, Machine translation, Information retrieval, ....

Page 33: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

ML/DL Improve With Scale

Data & Compute

Performance ML / DL

Many previous methods

Past Present Future

Page 34: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

ML & NETSECAgenda

Page 35: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Intrusion & Intrusion Detection

35

“Intrusion is an attempt to compromise CIA

(Confidentiality, Integrity, Availability), or to bypass the

security mechanisms of a computer or network”

“Intrusion detection is the process of monitoring the

events occurring in a computer system or network, and

analyzing them for signs of an intrusion”

/ 50

Page 36: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

3 Main Detection Methodologies• Signature-based Detection (SD)

• Signature: pattern corresponding to known attack or threat

• SD: process to compare patterns against captured events

• A.K.A “Knowledge-based Detection”

• Anomaly-based Detection (AD)

• Anomaly is a deviation to “normal” behavior

• Profile of normal is derived from monitoring network traffic

• AD compares normal profile with observed events

• Stateful Protocol Analysis (SPA)

• Vendor-developed generic profiles to specific protocols

Page 37: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Cybersecurity System

• Attacks evolve, ergo building defense systems is nontrivial

• Thus, higher-level & adaptive methodologies are required

Page 38: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Adaptive Cybersecurity

• Data-capturing tools (Libpcap, Winpcap, etc.) capture events from the audit trails of information sources (e.g. network)

• Data-preprocessing module filters out the attacks from which good signatures have been learned

• A feature-extractor derives basic features (sequence of syscalls, start time, NetFlow duration, src/dest IP/port, protocol, byte and packet counts

• Analysis engine implements detection methods for infrastructure anomalies, which may or may not have appeared before

Page 39: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

WHAT WORKS WHAT DOESN’T Agenda

Page 40: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Curse of Dimensionality

40

• Data volume is massive

– min. ~100M events per day

• Much of the data is streaming data

– Requires inline, real-time analysis

• Feature space is high dimensional

/ 50

Page 41: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

$/Detection Performance Abysmal

• Looking for “every anomaly” is cost prohibitive

– if at all [practically] possible

• Narrowing down the criteria too much

– results in false negative

• Reference data hard to gain due to privacy concerns

– Simulated data is useless

• ML was supposed to be better than signature era

Page 42: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Husky Recognition

Page 43: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

• We built an effective snow recognition model…

Learned Features

Page 44: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Models: Simple Correlations

• Simple models are also (usually) wrong

Page 45: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Network Anomalies

45

• Malicious data packets have a small variety(low type-count), but happen in high frequency

– Current models are not good at detecting this type of anomaly

• Anomaly/outlier varies among application domains

• Labeled anomalies are not available for training/validation

/ 50

Page 46: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Baselining

• Using ML to detect anomaly is easy when baseline is well-defined and follows simple mathematical model (e.g. Normal Distribution)

• Most real-world systems don’t render a simple baseline (i.e. their behavior is very complex)

• [!]Sanctity of baseline: “nearly 100% of networks are compromised”

Page 47: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Time Shifting

• “Window problem”: algos should be limited to ingest data in chunks that can be processed

– What if the anomaly is seeded outside that window?

• Network traffic diversity: usage varies in every session and with new applications

– window should also be shifted for recurring training

• Serious impact on performance, real-time, and security

Page 48: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

There’s More…

• How do you trust what the model predicts?– i.e. how do we know the model works correctly (husky)?

• Designing sound evaluation schemes can be more difficult than the detector itself

• We really don’t know how ML works

• … or how to reason about ML models

• … or how to debug them

• For now it’s just magic & voodoo

Page 49: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

CONCLUSIONAgenda

Page 50: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Summary

50

• ML is a great and necessary technology

• ML really shines for some classes of problems

• ML is NOT the best solution for every problem (e.g. NetSec)

• Obtaining (and training with) useful data remains a challenge

• ML is just one initial building block of Machine Cognition and Artificial Understanding: there are many more

• Still a long way before machines can replicate humans!

/ 50

Page 51: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

THANK YOU!

Hadi Nahari

[email protected]

hadinahari

Page 52: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

Backup

Page 53: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

References• Prof. Karl Friston seminal works

(http://www.fil.ion.ucl.ac.uk/~karl/#_Free-energy_principle)• “Why Should I Trust You?” Explaining the Predictions of Any Classifier, Carlos Guestrin, et al

(https://arxiv.org/pdf/1602.04938.pdf)• “Using Machine Learning in Network Intrusion Detection Systems”, Omar Shaya

(http://www.slideshare.net/OmarShaya/machine-learning-in-networks-intrusion-detection?next_slideshow=1)

• “Machine Learning Is Not The Answer To Better Network Security”, Matt Harrigan(https://techcrunch.com/2016/02/29/machine-learning-is-not-the-answer-to-better-network-security/)

• “Machine Learning Algorithm Cheat Sheet”, Laura Diane Hamilton, (http://www.lauradhamilton.com/machine-learning-algorithm-cheat-sheet)

• “Anomaly Detection Approaches for Communicating Networks”(http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf)

• “A Survey on Machine Learning Techniques for Intrusion Detection Systems”, J. Sing, N.J. Nene, (http://ijarcce.com/upload/2013/november/35-o-jayveer_singh-A_Survey_on_Machine.pdf)

• “Machine Learning Techniques for Anomaly Detection: An Overview”, S. Omar, et al,(http://research.ijcaonline.org/volume79/number2/pxc3891478.pdf)

• “Recent Advances in Predictive (Machine) Learning”, J.H. Friedman, et al, (http://statweb.stanford.edu/~jhf/ftp/machine)

• “Outside the Closed World: On Using Machine Learning For Network Intrusion Detection”, R. Sommer, V. Paxson, (http://www.utdallas.edu/~muratk/courses/dmsec_files/oakland10-ml.pdf)

• http://xkcd.com

Page 54: Machine Learning - slides.yowconference.com · DNN: Feature Engineering Anything humans can do in 0.1 sec, the right, ... • Profile of normal is derived from monitoring network

Copyright Cognomotiv 2016

• IQ scores are rising

• Underlying biological “HW” declining

• “Intelligence” is in decline

Are Humans Getting Smarter?