machine learning as applied to intrusion detection by christine fossaceca

Machine Learning as Applied to Intrusion Detection

By Christine Fossaceca

*Why is thwarting cyber attacks important?

“It’s only a matter of the ‘when,’ not the ‘if,’ that we are going to see something dramatic,” Admiral Michael Rogers, director of the National Security Agency testifying before Congress in late 2014 on the real possibility of a cyber attack on critical infrastructure in the United States (WSJ – November 2014)

What is Intrusion Detection?*A “network intrusion” or “network attack”

* is an event a user attempts to exploit a system vulnerability

*In order to either 1)gain access to network resources or

2)disrupt the ability of the network to operate.

Intrusion detection systems are designed to help detect network attacks and alert network operators to the presence of such attacks so that they may take appropriate actions.

Motivation

*MARK-ELM: Application of a novel Multiple Kernel Learning framework for improving the robustness of Network Intrusion Detection

*I chose this paper because it was written by my dad for his PhD and I was familiar with his research. I want to go into the field of Cyber security when I graduate, and I like reading about the different ways to enhance cyber security in the world today.

*This system combines machine learning techniques that we have learned about in class

The MARK-ELM

*The MARK –ELM is a new approach to intrusion detection

*Multiple Adaptive Reduced Kernel Extreme Learning Machine

*Instead of using a single algorithm, this approach combines the outputs of a variety of decisions to increase overall effectiveness and obtain a better decision than any of the individual classifiers

Problems Today

*High rate of false alarms

*Inability to identify multiple types of attacks

*Too much “human tuning” is requiredThe MARK-ELM addresses all of these issues with a low rate of false alarms, multiclass classification ability, and minimal human tuning

KDD Cup 99 Dataset*This dataset was used in the Third International Knowledge Discovery and Data Mining Tools Competition.

*“The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between “bad” connections, called intrusions or attacks, and “good” normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.” –UCI Repository Description

*Used in 1999 but continues to be the most widely used intrusion detection benchmark dataset today

KDD Cup 99*However, the dataset, which is relative to actual

network traffic, has a disproportionate amount of normal traffic and DDOS examples, as compared to the other attack types, Probing, User to Root (U2R) and Remote to Local (R2L)

Traffic

Type

Training

Examples

Percent

of

Total

Testing

Examples

Percent

of

Total

Normal 43881 60.28% 43950 60.38%

DDOS 27307 37.51% 27265 37.46%

Probe 1080 1.48% 1051 1.44%

R2L 490 0.68% 505 0.69%

U2R 30 0.04% 22 0.03%

Table 2.4: Typical Distribution for KDD Cup 99 DatasetFrom MARK-ELM

*Adapted from (Fossaceca, John M. ,2014)

Multiple*Ensemble learning– Combines the decisions of

multiple extreme learning machines

*Different outputs are combined using a weighting scheme, so that algorithms that are good at classifying one type of attack and not another can contribute to that classification, without affecting the data negatively for the other types of attacks.

*Instead of using a single algorithm, this approach combines the outputs of a variety of decisions to increase overall effectiveness and obtain a better decision than any of the individual classifiers

*One big problem with PAST algorithms was that they have a high rate of detection for only one type of attack DDOS

Adaptive

*Not only is this classifier able to identify different types of attacks, but it is also robust when it comes to noise and mislabeled data because of its Adaboost voting scheme *One of the most important algorithms used in this classifier is Adaboost:

*“Iterative ensemble learning approach that ensures parameters are chosen using a technique that emphasizes errors and continually tries to optimize parameters.”

*Extended the MKBoost sampling algorithm originally designed to handle only binary class data to handle input data with multiple classes.

Reduced

*The data was trained in a reduced-kernel manner so that the full kernel is not calculated

*For extremely large datasets, the computation and storage of a full kernel is not practical in many cases.

*MARK-ELM computes a reduced kernel matrix for each round of learning using a small sample of the input data. The data is chosen randomly with the constraint that there must be at least one example of every class present in selected sample set.

Kernel


Φ: x → φ(x)

General idea: the original input space can be mapped to some higher-dimensional feature space where the training set is separable

( , ) ( ) ( )Ti j i jK x x x x

A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space.

*Adapted from (Eck, D,2006)

Kernel Trick

Fractional Polynomial Kernels

*Value between 0 and 1

*Does not meet the standard definition of a kernel(Mercer’s condition), but were proven to still be usable in machine learning and have been shown to be effective

*First discussed in (Rossius, R., Zenker, G., Ittner, A., & Dilger, W., 1998) as a means of improving the performance of SVMs. The authors argue that “fractional degrees allow a more continuous range of concepts”.

*Approach has never been used before on intrusion detection data

What is an Extreme Learning Machine?*Stems from the support vector machine

*The extreme learning machine is calculated with an inverse matrix so it trains much faster than a support vector machine

*Can handle multiclass classification directly, which is better than the Support Vector Machine (which, due to its limitation of only handling binary classification without special grouping like one against one and one against all, can take times on the order of O(n2))

Big Differences

SUPPORT VECTOR MACHINE

EXTREME LEARNING MACHINE

Trains slower than the ELM

Trains faster than the SVM

Can only directly handle binary classification

Can directly handle multiclass classification

Quadratic Programming Matrix inverse

Some Math


Some Math (cont)


Results


Future Applications*Unsupervised learning is a more real-world

approach to machine learning in intrusion detection, and utilizing this algorithm to build an unsupervised machine would be very valuable to future intrusion detection applications

*This algorithm would probably train well on other data such in important areas of bioinformatics and should be researched further to discover the most effective combination methods.

machine learning as applied to intrusion detection by christine fossaceca

Documents