machine learning - slides.yowconference.com · dnn: feature engineering anything humans can do in...

Copyright Cognomotiv 2016

Machine Learning

No: It Can’t Do That!

Hadi Nahari

[email protected]

hadinahari

mailto:[email protected]


“Friends, Romans, countrymen, lend me your ears;

I come to bury Caesar, not to praise him.

The evil that men do lives after them…”

Julius Caesar

Act 3, Scene II


Setup

• ML + NetSec


National Academy of EngineeringGrand Challenges for 21st Century

"The best minds of my generation are thinking about how to make people click ads.” ---Jeff Hammerbacher


Agenda

• Motivations

• Machine Learning 101

• ML & Network Security

• What Works, What Doesn’t

• Conclusion

5 / 50


MOTIVATIONSAgenda


ML Is NOT New

• This is the 5th round…


ML is HOT!!

• VCs fund ML-companies like crazy

• Amazing new fields have opened

– Autonomous driving, behavior analytics, etc.

• Ton of existing fields have been revived

– Search, personalization/customization, audio processing, image processing, etc., etc.


• Mainly because…


Code Complexity

• Space Shuttle: ~400K LOC

• F22 Raptor fighter: ~2M LOC

• Linux kernel 2.2: ~2.5M LOC

• Hubble telescope: ~3M LOC

• Android core: ~12M LOC

• Future Combat Sys.: ~63M LOC

• Connected car: ~100M LOC

• Autonomous vehicle: ~300M LOC

10 / 50


• Autonomous vehicle: ~300M LOC

Large Hadron Collider: 60 M LOC

50 M LOC


Usecase Complexityservice provider

on avg. only five passwords per 40 online accounts per user

where to store the tokens???


Data Procreation

• >2 billion GB of new data is created every day– 2.3283006436538696 B GB to be exact

• Sparse data: mainly 0s

• In ‘93 the information on the internet surpassed all information that humanity had created before it


Stack Proliferation

HW Architecture(s)

Applications


Algorithms

15 / 50


Algorithms


ML 101Agenda


Machine Learning (ML)• Study of pattern recognition & computational

learning theory in Artificial Intelligence (AI)

• Algorithms to learn from, and make predictions on data

• As opposed to following strictly static program instructions


ML Models• Supervised learning

• Unsupervised learning

• (Semi-supervised learning)

• Reinforcement learning


Supervised Learning

20

• {(labeled) Input} [map] {Expected Output}

• Find [map]

/ 50


Supervised Learning Model


Unsupervised Learning• {(unlabled) Input} [map] {Output}

• Find structure (patterns) in {Input}


Unsupervised Learning Model


Reinforcement Learning• No correct {Input}/{Output}

• Action, environment, reward


Reinforcement Learning Model

25 / 50


Main ML Approaches• Decision Tree Learning, Association Rule Learning

• Inductive Logic Programming, Support Vector Machines, Clustering, Bayesian Networks

• Representation Learning, Genetic Algorithms

• Similarity and Metric Learning, Sparse Dictionary Learning

• Artificial Neural Networks (ANN), Deep Learning (DL)


Neural Network

• Interpret an Artificial Intelligence (AI) task as the evaluation of complex functions

– Facial Recognition: Map a bunch of pixels to a name

– Handwriting Recognition: Image to a character

• NN: Network of interconnected simple neurons


The NeuronFeed-forward system, made up of two stages:

Linear Transformation of data

Point-wise application of non-linear function

X

1

X

2

X

3

W

1

W

2

W

3

yi =F(ΣWiXi)i

F(x) =max(0,x)

(also sigmoid, Rectified Linear Unit (ReLU), etc.)


Artificial Neural Network (ANN)• Layers and layers of neurons, with many

connections

Input:

Output:


Deep Learning (DL)

30

• Branch of ML based on a set of algorithms that:

• Attempt to model high-level data abstractions

• Are based on learning representations of data

• Use complex architectures with multiple non-linear transformations

• Some representations make it easier to learn tasks from examples (e.g. Alpha Go)

/ 50


DNN: Learning Feature Representation

Input Result


DNN: Feature Engineering

Anything humans can do in 0.1 sec, the right, big 10-layer network can do too

Image Vision features Detection

Images/video

Audio Audio features Speaker ID

Audio

Text

Text Text features

Text classification, Machine translation, Information retrieval, ....


ML/DL Improve With Scale

Data & Compute

Performance ML / DL

Many previous methods

Past Present Future


ML & NETSECAgenda


Intrusion & Intrusion Detection

35

“Intrusion is an attempt to compromise CIA

(Confidentiality, Integrity, Availability), or to bypass the

security mechanisms of a computer or network”

“Intrusion detection is the process of monitoring the

events occurring in a computer system or network, and

analyzing them for signs of an intrusion”

/ 50


3 Main Detection Methodologies• Signature-based Detection (SD)

• Signature: pattern corresponding to known attack or threat

• SD: process to compare patterns against captured events

• A.K.A “Knowledge-based Detection”

• Anomaly-based Detection (AD)

• Anomaly is a deviation to “normal” behavior

• Profile of normal is derived from monitoring network traffic

• AD compares normal profile with observed events

• Stateful Protocol Analysis (SPA)

• Vendor-developed generic profiles to specific protocols


Cybersecurity System

• Attacks evolve, ergo building defense systems is nontrivial

• Thus, higher-level & adaptive methodologies are required


Adaptive Cybersecurity

• Data-capturing tools (Libpcap, Winpcap, etc.) capture events from the audit trails of information sources (e.g. network)

• Data-preprocessing module filters out the attacks from which good signatures have been learned

• A feature-extractor derives basic features (sequence of syscalls, start time, NetFlow duration, src/dest IP/port, protocol, byte and packet counts

• Analysis engine implements detection methods for infrastructure anomalies, which may or may not have appeared before


WHAT WORKS WHAT DOESN’T Agenda


Curse of Dimensionality

40

• Data volume is massive

– min. ~100M events per day

• Much of the data is streaming data

– Requires inline, real-time analysis

• Feature space is high dimensional

/ 50


$/Detection Performance Abysmal

• Looking for “every anomaly” is cost prohibitive

– if at all [practically] possible

• Narrowing down the criteria too much

– results in false negative

• Reference data hard to gain due to privacy concerns

– Simulated data is useless

• ML was supposed to be better than signature era


Husky Recognition


• We built an effective snow recognition model…

Learned Features


Models: Simple Correlations

• Simple models are also (usually) wrong


Network Anomalies

45

• Malicious data packets have a small variety(low type-count), but happen in high frequency

– Current models are not good at detecting this type of anomaly

• Anomaly/outlier varies among application domains

• Labeled anomalies are not available for training/validation

/ 50


Baselining

• Using ML to detect anomaly is easy when baseline is well-defined and follows simple mathematical model (e.g. Normal Distribution)

• Most real-world systems don’t render a simple baseline (i.e. their behavior is very complex)

• [!]Sanctity of baseline: “nearly 100% of networks are compromised”


Time Shifting

• “Window problem”: algos should be limited to ingest data in chunks that can be processed

– What if the anomaly is seeded outside that window?

• Network traffic diversity: usage varies in every session and with new applications

– window should also be shifted for recurring training

• Serious impact on performance, real-time, and security


There’s More…

• How do you trust what the model predicts?– i.e. how do we know the model works correctly (husky)?

• Designing sound evaluation schemes can be more difficult than the detector itself

• We really don’t know how ML works

• … or how to reason about ML models

• … or how to debug them

• For now it’s just magic & voodoo


CONCLUSIONAgenda


Summary

50

• ML is a great and necessary technology

• ML really shines for some classes of problems

• ML is NOT the best solution for every problem (e.g. NetSec)

• Obtaining (and training with) useful data remains a challenge

• ML is just one initial building block of Machine Cognition and Artificial Understanding: there are many more

• Still a long way before machines can replicate humans!

/ 50


THANK YOU!

Hadi Nahari

[email protected]

hadinahari

mailto:[email protected]


Backup


References• Prof. Karl Friston seminal works

(http://www.fil.ion.ucl.ac.uk/~karl/#_Free-energy_principle)• “Why Should I Trust You?” Explaining the Predictions of Any Classifier, Carlos Guestrin, et al

(https://arxiv.org/pdf/1602.04938.pdf)• “Using Machine Learning in Network Intrusion Detection Systems”, Omar Shaya

(http://www.slideshare.net/OmarShaya/machine-learning-in-networks-intrusion-detection?next_slideshow=1)

• “Machine Learning Is Not The Answer To Better Network Security”, Matt Harrigan(https://techcrunch.com/2016/02/29/machine-learning-is-not-the-answer-to-better-network-security/)

• “Machine Learning Algorithm Cheat Sheet”, Laura Diane Hamilton, (http://www.lauradhamilton.com/machine-learning-algorithm-cheat-sheet)

• “Anomaly Detection Approaches for Communicating Networks”(http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf)

• “A Survey on Machine Learning Techniques for Intrusion Detection Systems”, J. Sing, N.J. Nene, (http://ijarcce.com/upload/2013/november/35-o-jayveer_singh-A_Survey_on_Machine.pdf)

• “Machine Learning Techniques for Anomaly Detection: An Overview”, S. Omar, et al,(http://research.ijcaonline.org/volume79/number2/pxc3891478.pdf)

• “Recent Advances in Predictive (Machine) Learning”, J.H. Friedman, et al, (http://statweb.stanford.edu/~jhf/ftp/machine)

• “Outside the Closed World: On Using Machine Learning For Network Intrusion Detection”, R. Sommer, V. Paxson, (http://www.utdallas.edu/~muratk/courses/dmsec_files/oakland10-ml.pdf)

• http://xkcd.com

http://www.fil.ion.ucl.ac.uk/~karl/#_Free-energy_principle

https://arxiv.org/pdf/1602.04938.pdf

http://www.slideshare.net/elfadly/a-review-of-machine-learning-based-anomaly-detection

http://www.slideshare.net/elfadly/a-review-of-machine-learning-based-anomaly-detection

http://www.lauradhamilton.com/machine-learning-algorithm-cheat-sheet

http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf

http://ijarcce.com/upload/2013/november/35-o-jayveer_singh-A_Survey_on_Machine.pdf

http://research.ijcaonline.org/volume79/number2/pxc3891478.pdf

http://statweb.stanford.edu/~jhf/ftp/machine

http://www.utdallas.edu/~muratk/courses/dmsec_files/oakland10-ml.pdf

http://xkcd.com


• IQ scores are rising

• Underlying biological “HW” declining

• “Intelligence” is in decline

Are Humans Getting Smarter?

machine learning - slides.yowconference.com · dnn: feature engineering anything humans can do in...

Documents