using machine learning in networks intrusion detection systems

27
Using Machine Learning in Networks Intrusion Detection Systems OMAR SHAYA Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 1

Upload: omar-shaya

Post on 21-Apr-2017

1.004 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Using Machine Learning in Networks Intrusion Detection Systems

Using Machine Learning in Networks Intrusion Detection

Systems

OMAR SHAYA

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 1

Page 2: Using Machine Learning in Networks Intrusion Detection Systems

Sections

✤ Introduction

✤ Intrusion Detection Methodologies

✤ A Machine Learning Based IDS (Intrusion Detection System)

✤ Challenges of Using Machine Learning in Intrusion Detection

✤ Summary

✤ References

✤ Appendix

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 2

Page 3: Using Machine Learning in Networks Intrusion Detection Systems

INTRODUCITON

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 3IDS: Intrusion Detection System

Page 4: Using Machine Learning in Networks Intrusion Detection Systems

Increasing attacks on computer networks and the need for automated detection

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 4

• Internet and computer systems have raised numerous security and privacy issues

• Explosive use of networks due to many reasons e.g. internet, wireless networks, cloud computing

• Thus, malicious attacks on networks have increased year over year

• Need to automate systems that detect these attacks • Based on on known attacks • But what about attacks that were not seen before • Machine learning?

INTRODUCTION

Page 5: Using Machine Learning in Networks Intrusion Detection Systems

Definition: intrusion & intrusion detection

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 5

INTRODUCTION

“Intrusion is an attempt to compromise CIA (Confidentiality, Integrity, Availability), or to bypass the security mechanisms of a computer or network“

“Intrusion detection is the process of monitoring the events occurring in a computer system or network, and analyzing them for signs of intrusion”

Page 6: Using Machine Learning in Networks Intrusion Detection Systems

INTRUSION DETECTION METHODOLOGIES

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 6IDS: Intrusion Detection System

Page 7: Using Machine Learning in Networks Intrusion Detection Systems

There are 3 main Detection Methodologies

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 7

• Signature-based Detection (SD) • A signature is a string or pattern that corresponds to known attack or threat • SD is a process to compare patterns against captured events for recognizing

possible intrusions • Uses the knowledge accumulated by specific attacks and system vulnerabilities • Also known as Knowledge-based Detection or Misuse Detection

• Anomaly-based Detection (AD) • Anomaly is a deviation to “normal” behavior • Profiles of normal derived from monitoring network traffic • AD compares normal profiles with observed events to recognize attacks

• Stateful Protocol Analysis (SPA) • SPA depends on vendor-developed generic profiles to specific protocols • Protocols based on standards from international standard organizations

• Hybrid IDS use multiple methodologies • SD and AD are complementary methods, former concerns with certain attacks

and the later focuses on unknown attacks

INTRUSION DETECTION METHODOLOGIES

Page 8: Using Machine Learning in Networks Intrusion Detection Systems

There are 3 main Detection Methodologies

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 8

• Hybrid IDS use multiple methodologies • E.g. SD and AD are complementary methods • SD concerns with certain attacks and AD focuses on unknown attacks

INTRUSION DETECTION METHODOLOGIES

Signature-based Detection (SD)*

Anomaly-based Detection (AD)

Stateful Protocol Analysis (SPA)

SD is a process to compare patterns against captured events for recognizing possible intrusions

AD compares normal profiles with observed events to recognize attacks

SPA depends on vendor-developed generic profiles to specific protocols

A signature is a string or pattern that corresponds to known attack or threat

Anomaly is a deviation to “normal” behavior

The stateful in SPA indicates that IDS could know and trace the protocol states (e.g., pairing requests with replies)

Uses the knowledge accumulated by specific attacks and system vulnerabilities

Profiles of normal derived from monitoring network traffic

Protocols based on standards from international standard organizations

* Also known as Knowledge-based Detection or Misuse Detection

Page 9: Using Machine Learning in Networks Intrusion Detection Systems

Pros and cons of Intrusion Detection Methods

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 9

INTRUSION DETECTION METHODOLOGIES

Table 1: Pros and Cons of intrusion detection methodologies. Source [2]

Signature-based Detection (SD)

Anomaly-based Detection (AD)

Stateful Protocol Analysis (SPA)

• Simplest and effective method to detect attacks

• Detail contextual analysis

• Effective to detect new and unforeseen vulnerabilities

• Less dependent on OS

• Facilitate detections of privilege abuse

• Know and trace protocol states

• Distinguish unexpected sequences of commands

• Ineffective with unknown attacks and variants of known attacks

• Little understanding to states and protocols

• Hard to keep signatures/patterns up to date

• Time consuming to maintain the knowledge

• Weak profiles accuracy due to observed events

• Unavailable during rebuilding of behavior profiles

• Difficult to trigger alerts in right time

• Resource consuming to protocol state tracing and examination

• Unable to inspect attacks looking like benign protocol behaviors

• Might be incompatible to dedicated OSs or APs

PRO

SC

ON

S

Page 10: Using Machine Learning in Networks Intrusion Detection Systems

A MACHINE LEARNING BASED IDS

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 10IDS: Intrusion Detection System

Page 11: Using Machine Learning in Networks Intrusion Detection Systems

Machine learning in anomaly detection

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 11

• Anomaly-based Detection (AD) • Easy when it is possible to characterize what is normal in the

data using simple mathematical model, e.g. normal distribution • Most interesting real world systems have complex behavior that

doesn’t follow such distribution • Machine learning is useful to learn the characteristics of the

system from observed data • Feature Selection is the process of selecting a subset of relevant

features (variables, predictors) for use in model construction. Feature selection techniques are used for three reasons: • Simplification of models to make them easier to interpret • Shorter training times • Enhanced generalization by reducing overfitting

• Outlier Detection: an outlier is an observation point that is distant from other observations

A MACHINE LEARNING BASED IDS

Page 12: Using Machine Learning in Networks Intrusion Detection Systems

Robust Feature Selection and Robust PCA for Internet Traffic Anomaly Detection

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 12

• Couples feature selection algorithm with outlier detection method

• Uses robust statistics tools in both procedures • Reliable results even with outliers’ presence • Feature selection based on robust mutual estimator

• MI (Mutual Information): an information-theoretic metric that captures both linear and non-linear dependencies

• Outlier detection on robust PCA (Principal Component Analysis) • Mathematical procedure used to reduce dimensionality of a

problem

A MACHINE LEARNING BASED IDS

Page 13: Using Machine Learning in Networks Intrusion Detection Systems

Robust Feature Selection and Robust PCA for Internet Traffic Anomaly Detection

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 13

• Feature selection • Important preprocessing step (filter) • Reduce dimensionality with high-dimensional data • Remove irrelevant data • Increase learning accuracy • Gives significant performance gains

A MACHINE LEARNING BASED IDS

Page 14: Using Machine Learning in Networks Intrusion Detection Systems

Robust Feature Selection and Robust PCA for Internet Traffic Anomaly Detection

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 14

A MACHINE LEARNING BASED IDS

• Robust statistics • Reliable results even in the

presence of outliers Example:

• In normal distribution, the inner 95% are in “center ± 1.96 X spread” • Center: instead of mean,

take the median • Spread: instead of SD (standard

deviation), take the MAD (median absolute deviation)

Source [1]

Page 15: Using Machine Learning in Networks Intrusion Detection Systems

Dataset creation for training and testing (1/2)

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 15

• Dataset collected from mirroring traffic passing the switch of: • Private laboratory network, 17 inter-connected PCs

• 10 for users producing licit traffic • 1 for server, 1 for measurements • 5 for attacks

• Licit traffic • File sharing (BitTorrent) • Video streaming (IPTV over TCP) • Web browsing (HTTP)

• Attacks • Botnets

• Port-scans: identify other targets vulnerable to infections • Snapshots: type of identity theft for stealing personal information • Other Botnet attacks are not used e.g. spyware, malware, denial of service, and

email spam • Happen uniquely on host level • Can be detected by e.g. anti-virus, monitoring at router/firewalls, email scanning

A MACHINE LEARNING BASED IDS

Page 16: Using Machine Learning in Networks Intrusion Detection Systems

Dataset creation for training and testing (2/2)

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 16

• Customer usage profiles • (a) Soft browsing (HTTP only) • (b) File sharing machine (BitTorrent only) • (c) File sharing user (BitTorrent and HTTP) • (d) Heavy user (HTTP, BitTorrent, and

Streaming)

• Network scenarios • (B) Business user

• 100% (a) • (R) Residential user

• 30% (b), 40% (c), 30% (d)

• Attack intensities • (1) 6% (5% snapshot, 1% port-scan) • (2) 20% (15% snapshot, 5% port-scan) • (3) 35% (30% snapshot, 5% port-scan)

A MACHINE LEARNING BASED IDS

Table 2. Source [1]

Page 17: Using Machine Learning in Networks Intrusion Detection Systems

Results (1/3)

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 17

A MACHINE LEARNING BASED IDS

• 6 types of anomaly detectors A-B • A: feature selection method, B Outlier

detection method • R (robust) • NR (non-robust) • ∅ (no-method)

• Performance measures • Nr Ftrs: number of selected features • Recall: probability that an observation is

classified as anomaly when in fact it is an anomaly

• False positive rate (FPR): probability that an observation is classified as an anomaly when in fact it is a regular observation

• Precision: probability of having an anomalous observation given that it is classified as an anomaly

Table 3. Source [1]

Page 18: Using Machine Learning in Networks Intrusion Detection Systems

Results (2/3)

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 18

• R-R detector achieved the best results • Recall is always 1 • B1, B2, B3, R3 performance is maximum • FPR and Precision are close to their optimal

• Improvement over non-robust version is high • Low recall means large percentage of

anomalies are not correctly identified • B2, B3, R3 recall improved from 0.167,

0.273, and 0.125 to 1

• Feature selection • Feature selection reduces Nr Ftrs, improves

performance • B3 and R3: no feature selection sometimes

better than non-robust feature selection

A MACHINE LEARNING BASED IDS

Table 3. Source [1]

Page 19: Using Machine Learning in Networks Intrusion Detection Systems

Results (3/3)

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 19

A MACHINE LEARNING BASED IDS

• Compare R-NR (top) and R-R (bottom)

• Any point with score or distance larger than a threshold (the lines) is considered an anomaly

• R-NR case there is confusion around snapshots • thus poor recall value 0.125 • proximity in behavior between snapshots and

some HTTP and BitTorrent fools the non-robust outlier detector • All consist of small file uploads

Source [1]Fig. 2.

Page 20: Using Machine Learning in Networks Intrusion Detection Systems

Discussion

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 20

• There are advantages of using feature selection step and using robust statistics for both feature selection and outlier detection • System achieves very high performance • The system’s anomaly detector is adaptive to different traffic conditions (licit traffic

differs significantly in the two scenarios)

• However, the dataset used was obtained from a private lab with 17 PCs, and not necessarily representative of a real world scenario • Need to show proof of the effectiveness of the system in larger scale network

traffic dataset

A MACHINE LEARNING BASED IDS

Page 21: Using Machine Learning in Networks Intrusion Detection Systems

CHALLENGES OF USING MACHINE LEARNING IN INTRUSION DETECTION

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 21

Page 22: Using Machine Learning in Networks Intrusion Detection Systems

Outliers, cost of error, semantics, and evaluation

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 22

• Outlier detection • Hard to define normal in network traffic as the usage varies in every

session and with new applications (diversity of network traffic)

• High cost of errors • Cost of misclassification is extremely high • False positive: expensive analyst time • False negative: cause serious damage to an organization • Error in other applications of ML not expensive e.g. product

recommendations, OCR, spam detection

• Semantic gap • Currently it is only assessment of capability to identify deviations from

normal profile (could be good or bad) • Need to interpret results from operator point of view, what does it mean?

• Difficulties with evaluation • Designing sound evaluation schemes can be more difficult than the

detector itself • Lack of public data sets for assessing anomaly detection

• Hard to gain real data set for many reasons e.g. leak of personal data • Simulated data is not accurate

CHALLENGES OF USING MACHINE LEARNING IN INTRUSION DETECTION

Page 23: Using Machine Learning in Networks Intrusion Detection Systems

SUMMARY

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 23

Page 24: Using Machine Learning in Networks Intrusion Detection Systems

Summary

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 24

• Introduction • The need for automated Intrusion Detection Systems • Definition of Intrusion and Intrusion Detection

• Intrusion Detection Methodologies • Signature-based Detection (SD) • Anomaly-based Detection (AD) • Stateful Protocol Analysis (SPA)

• Machine Learning Based IDS • Using feature selection and robust statistics • Dataset creation • Results and evaluation • Discussion

• Challenges of Using Machine Learning in ID • Outlier detection, high cost of error, semantic gap, and difficulties with evaluation

SUMMARY

Page 25: Using Machine Learning in Networks Intrusion Detection Systems

OMAR SHAYA –––––––– [email protected]

Thanks!

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 25

Page 26: Using Machine Learning in Networks Intrusion Detection Systems

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 26

References

[1] C. Pasocal, M. Oliveira, R. Valdas, P. Filzmoser, P. Salvador and A. Pacheco. Robust Feature Selection and Robust PCA for Internet Traffic Anomaly Detection. In Proceedings IEEE INFOCOM, pages 1755-1763, 2012

[2] H. Liao, C. Lin, Y. Lin and K. Tung. Intrusion Detection System: A Comprehensive Review. In Journal of Network and Computer Applications, pages 16-24, 2013

[3] R. Sommer and V. Paxson. Outside the Closed World: On Using Machine Learning For Network Intrusion Detection. In IEEE Symposium on Security and Privacy, pages 305-316, 2010

[4] Feature Selection. https://en.wikipedia.org/wiki/Feature_selection on 6 August 2015

[5] Outlier. https://en.wikipedia.org/wiki/Outlier on 6 August 2015

[6] Anomaly Detection – Using Machine Learning to Detect Abnormalities in Time Series Data. http://blogs.technet.com/b/machinelearning/archive/2014/11/05/anomaly-detection-using-machine-learning-to-detect-abnormalities-in-time-series-data.aspx on 6 August 2015

REFERENCES

Page 27: Using Machine Learning in Networks Intrusion Detection Systems

Precision and Recall

Georg-August-Universität Göttingen ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 27

APPENDIX

Source: Dr. Stephan Sigg’s slides from Machine Learning and Pervasive Computing course SoSe 2015