cisc 879 - machine learning for solving systems problems john cavazos dept of computer &...

38
CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879 Applying Support Vector Machines for Intrusion Detection on Virtual Machines Lecture 6

Upload: eugene-meese

Post on 29-Mar-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

John CavazosDept of Computer & Information Sciences

University of Delaware

www.cis.udel.edu/~cavazos/cisc879

Applying Support Vector Machines for Intrusion Detection on Virtual

MachinesLecture 6

Page 2: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Outline► Background and Motivation► Intrusion Detection Systems► Support Vector Machines (SVMs)► Dataset► Results► Conclusions

Slides adapted from presentation by Fatemeh Azmandian(http://www.ece.neu.edu/~fazmandi)

Page 3: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Background► Virtual Machine:

► A software implementation of a machine (computer) that executes programs like a real machine

► Virtual Machine Monitor (VMM) or hypervisor:

► The software layer providing the virtualization

► Allows the multiplexing of the underlying physical machine between different virtual machines, each running its own operating system

Page 4: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Background (cont’d)► Intrusion Detection:

► The process of monitoring the events occurring in a computer system or network and analyzing them for signs of intrusions

► Intrusion:► An attempt to compromise:

►Confidentiality► Integrity ►Availability

► An attempt to bypass the security mechanisms of a computer or network [1]

Page 5: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Background (cont’d)► Intrusion Detection System (IDS):

► Software or hardware system that automates the process of monitoring the events occurring in a computer system or network, analyzing them for signs of security problems

► Why is it important?► Every year, billions of dollars are lost due to

virus attacks

Page 6: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Financial Impact of Virus Attacks

Page 7: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Intrusion Detection Approaches► Misuse Detection

► Identifies intrusions based on known patterns for the malicious activity

► Known patterns are referred to as signatures

► Anomaly Detection► Identifies intrusions based on deviations from

established normal behavior

► Capable of identifying new (previously unseen) attacks

► New normal behavior may be misclassified as abnormal, producing false positives

Page 8: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Intrusion Detection Systems► Host IDS (HIDS):

► Performs intrusion detection from within host it is monitoring► Advantages:

► Good visibility of the internal state of the host machine ► Difficult for malicious code (malware) to evade the HIDS

► Disadvantage: ► Susceptible to attacks by malware

Page 9: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Intrusion Detection Systems► Network IDS (NIDS)

► Performs intrusion detection through network connections and outside the host machine

► Advantage:

► More resistant to attacks by malware

► Disadvantages:

► Poor visibility of the internal state of the host machine

► Easier for malware to evade the NIDS

Page 10: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Intrusion Detection Systems► VMM-based IDS:

► Performs intrusion detection for a virtual machine through the Virtual Machine Monitor (VMM)

► Advantages:

► Better visibility of the internal state of the host machine, compared to an NIDS

► Harder for malware to evade the IDS► Less susceptible to attacks by malware

► Our goal is to create a VMM-based IDS using machine learning techniques

► Support Vector Machines (SVMs)

Page 11: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

VMM-IDS Overview

Page 12: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Support Vector Machines (SVMs)► Machine learning to classify data points

into one of two classes► Two-Class SVMs

► Training is done on data from two classes

► One-Class SVMs► Training is done on data from only one class► During the testing phase, the origin and data

points close to it are considered part of the second class

Page 13: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Linear Classifiers

Slide Source: Andrew W. Moore

f x

yest

denotes +1denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Page 14: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

f x

yest

denotes +1denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Slide Source: Andrew W. Moore

Linear Classifiers

Page 15: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

f x

yest

denotes +1denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Slide Source: Andrew W. Moore

Linear Classifiers

Page 16: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

f x

yest

denotes +1denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Slide Source: Andrew W. Moore

Linear Classifiers

Page 17: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

f x

yest

denotes +1denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Slide Source: Andrew W. Moore

Linear Classifiers

Page 18: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

f x

yest

denotes +1denotes -1

f(x,w,b) = sign(w. x - b)

Slide Source: Andrew W. Moore

Classifier Margin

Define the margin of a linear

classifier as the width that the

boundary could be increased by

before hitting a datapoint

Page 19: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

x

yest

denotes +1denotes -1

f(x,w,b) = sign(w. x - b)

Slide Source: Andrew W. Moore

Maximum Margin

The maximum margin linear

classifier is the linear classifier with the maximum margin.This is the simplest kind of SVM (Called

an LSVM)Linear SVM

f

Page 20: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

x

yest

denotes +1denotes -1

f(x,w,b) = sign(w. x - b)

Slide Source: Andrew W. Moore

Maximum Margin

The maximum margin linear

classifier is the linear classifier with the maximum margin.This is the simplest kind of SVM (Called

an LSVM)Linear SVM

f

Support Vectors are those

datapoints that the margin pushes up

against

Page 21: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Suppose 1-dimension

What would SVMs do with

this data?

x=0

Page 22: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Suppose 1-dimension

Not a big surprise

Positive “plane” Negative “plane”

x=0

Page 23: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

What can be done about

this?

x=0

Harder 1-dimensional dataset

Page 24: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Harder 1-dimensional dataset

Use a kernel function to

project the data onto higher dimensional

space

x=0),( 2

kkk xxz

Page 25: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Harder 1-dimensional dataset

x=0),( 2

kkk xxz

Use a kernel function to

project the data onto higher dimensional

space

Page 26: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Non-linear SVMs: Feature spaces

Φ: x → φ(x)

Input space Feature space

Page 27: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Non-linear SVMs: Feature spaces

► Kernel functions are used to transform data into a different, linearly separable feature space

(.)( )

( )

( )( )( )

( )

( )( )

( )

( )

( )

( )( )

( )

( )

( )( )

( )

Feature spaceInput space

Page 28: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Non-linear SVMs:Kernel Functions

► Popular Kernel Functions:► Linear kernel

► Polynomial Kernel

► Gaussian Radial Basis Function (RBF) kernel

► Sigmoid kernel

Page 29: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Dataset► Synthetic dataset based on SQL and AsteriskNow

workload► Process-level features

► Rate-based features► Correlation-based features

► Time-based windows of execution► Current window size: 50 interrupt timers

► Three normal datasets per workload► Two abnormal datasets per workload

► Consists of both normal and abnormal data points

Page 30: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Constructing Features

Page 31: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Features

Page 32: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Two-Class SVM Results

ExperimentWorkload

Train on Abn1Test on Abn2

Train on Abn2Test on Abn1

Mixed FeaturesSQL 0.90 0.96

Asterisk 0.81 0.81

Rate FeaturesSQL 0.91 0.95

Asterisk 0.82 0.76

Correlation FeaturesSQL 0.91 0.95

Asterisk 0.85 0.73

Page 33: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

SQL Train on Abn1 and Test on Abn2: Time Series Plot

Page 34: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

SQL Train : Train on Abn1 and Test on Abn2: (ROC Curve)

Page 35: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

SQL Train on Abn2 and Test on Abn1: Time Series Plot

Page 36: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

SQL Train on Abn2 and Test on Abn1: ROC Curve

Page 37: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

Conclusions► Two-class SVM can perform well in detecting intrusions

in virtual machine environments

► Goal to develop accurate intrusion detection system for VMs based on machine learning techniques

Page 38: CISC 879 - Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware cavazos/cisc879

CISC 879 - Machine Learning for Solving Systems Problems

References[1] R. Bace and P. Mell. Intrusion Detection Systems. NIST Special Publications SP 800-

31, November, 2001.

[2] T. Garfinkel and M. Rosenblum. A Virtual Machine Introspection Based Architecture for Intrusion Detection. Proceedings of the Network and Distributed Systems Security Symposium, 2003.

[3] Andrew Moore’s slides on Support Vector Machines

http://www.cs.cmu.edu/~awm/tutorials

[4] Prasad’s slides on Support Vector Machines

www.cs.wright.edu/~tkprasad/courses/cs499/L18SVM.ppt

[5] 2005 Malware Report: Executive Summary http://www.computereconomics.com/article.cfm?id=1090

[6] Virtual Machine

http://en.wikipedia.org/wiki/Virtual_machine