masquerade detection mark stamp 1masquerade detection

Masquerade Detection 1

Masquerade Detection

Mark Stamp


Masquerade Detection

Masquerader --- someone who makes unauthorized use of a computer

How to detect a masquerader? Here, we consider…

Anomaly-based intrusion detection (IDS) Detection is based on UNIX commands

Lots and lots of prior work on this problem We attempt to apply PHMMs

For comparison, we also implement other techniques (HMM and N-gram)


Schonlau Data Set

Schonlau, et al, collected large data set Contains UNIX commands for 50 users

50 files, one for each user Each file has 15k commands, 5k from user

plus 10k for masquerade test data Test data: 100 blocks, 100 commands each

Dataset includes map file 100 rows (test blocks), 50 columns (users) 0 if block is user data, 1 if masquerade data


Schonlau Data Set

Map file structure

This data set used for many studies Approximately, 50 published papers


Previous Work

Approaches to masquerade detection Information theoretic Text mining Hidden Markov models (HMM) Naïve Bayes Sequences and bioinformatics Support vector machines (SVM) Other approaches

We briefly look at each of these


Information Theoretic

Original work by Schonlau included a compression technique

Based on theory (hope?) that legitimate commands compress more than attack

Results were disappointing Some additional recent work

Still not competitive with best approaches


Text Mining

A few papers in this area One approach extracts repetitive

sequences from training data Another paper use principal

component analysis (PCA) Method of “exploratory data analysis” Good results on Schonlau data set But high cost during training phase


Hidden Markov Models

Several authors have used HMMs One of the best known approaches

We have implemented HMM detector We do sensitivity analysis on the

parameters In particular, determine optimal N

(number of hidden states) We also use HMMs for comparison

with our PHMM results


Naïve Bayes

In simplest form, relies only on command frequencies That is, no sequence info is used

Several papers analyze this approach Among the simplest approaches And, results are good


Sequences

In a sense, this is the opposite extreme from naïve Bayes Naïve Bayes only considers frequency

stats Sequence/bioinformatics focused on

sequence-related information Schonlau’s original work included

elementary sequence-based analysis


Bioinformatics

We are aware of only one previous paper that uses bioinformatics approach Use Smith-Waterman algorithm to create

local alignments Alignments then used directly for detection

In contrast, we do pairwise alignments, MSA, PHMM PHMM is used for scoring (forward algorithm) Our scoring is much more efficient Also, our results are at least as strong


Support Vector Machines

Support vector machines (SVM) Machine learning technique Separate data points (i.e., classify) based

on hyperplanes in high dimensional space Original data mapped to higher dimension,

where separation is likely easier SVMs maximize separation

And have low computational costs Used for classification and regression

analysis


SVMs & Masquerade Detection

SVMs have been applied to masquerade detection problem

Results are good Comparable to naïve Bayes

Recent work using SVMs focused on improved efficiency


Other Approaches

The following have also been studied Detect using low frequency commands Detect using high frequency commands Hybrid Bayes “one step Markov”

Natural to consider hybrid approaches Multistep Markov

Markov process of order greater than 1

None of these particularly successful


Other Approaches (Continued)

Non-negative matrix factorization (NMF) At least 2 papers on this topic Appears to be competitive

Other hybrids that attempt to combine several approaches So far, no significant improvement over

individual techniques


HMMs

See previous presentation


HMM for Masquerade Detection

Using the Schonlau data set we… Train HMM for each user Set thresholds Test the models and plot results

Note that this has been done before Here, we perform sensitivity analysis

That is, we test different number of hidden states, N

Also use it for comparison with PHMM


HMM Experiments

Plotted as “ROC” curves Closer to

origin is better

Useful region That is, false

positives below 5%

The shaded region


HMM Conclusion

Number of hidden states does not matter

So, use N=2 Since most

efficient


PHMM

See previous presentation


PHMM Experiments

A problem with Schonlau data… For given user, 5000 commands

No begin/end session markers So, must split it up to obtain multiple

sequences But where to split sequence? And what about tradeoff between number of

sequences and length of each sequence? That is, how to decide length/number???


PHMM Experiments

Experiments done for following cases:

See next slide…


PHMM Experiments

Tests various numbers of sequences

Best results 5 sequences,

1k commands each seq.

This case in next slide


PHMM Comparison

Compare PHMM to “weighted N-gram” and HMM

HMM is best PHMM is

competitive


PHMM Detector

PHMM at disadvantage on Schonlau data PHMM uses positional information Such info not available for Schonlau data We have to guess the positions for PHMM

How to get fairer comparison between HMM and PHMM? We need different data set

Only option is simulated data set


Simulated Data

We generate simulated data as follows Using Schonlau data, construct Markov

chain for each user Use resulting Markov chain to generate

sequences representing user behavior Restrict “begin” to more common

commands What’s the point?

Simulated seqs have sensible begin and end


Simulated Data

Training data and user data for scoring generated using Markov chain

Attack data taken from Schonlau data How much data to generate? First test, we generate same amount of

simulated data as is in Schonlau set That is, 5k commands per user


Detection with Simulated Data

PHMM vs HMM Round 2

It’s close, but HMM still wins!


Limited Training Data

What if less training data is available? In a real application, initially, training

data is limited Can’t detect attacks until sufficient

training data has been accumulated So, less data required, the better

Experiments, using simulated data, limited training date Used 200 to 800 commands for training


Limited Training Data

PHMM vs HMM Round 3

With 400 or less, PHMM wins big!


Conclusion

PHMM is competitive with best approaches

PHMM likely to do better, given better training data (begin/end info)

PHMM much better than HMM when limited training data available Of practical importance Why does it make sense that PHMM would

do better with limited training data?


Conclusion

Given current state of research… Optimal masquerade detection

approach Initially, collect small training set Train PHMM and use for detection No attack, then continue to collect data When sufficient data available, train

HMM From then on, use HMM for detection


Future Work

Collect better real data set!!! Many problems/limitations with Schonlau

data Improved data set could be basis for

lots and lots of research Directly compare PHMM/bioinformatics

approaches with previous work (HMM, naïve Bayes, SVM, etc., etc.)

Consider hybrid techniques Other techniques?


References

Masquerade detection using profile hidden Markov models, L. Huang and M. Stamp, to appear in Computers and Security

Masquerading user data, M. Schonlau

http://www.schonlau.net/intrusion.html

masquerade detection mark stamp 1masquerade detection

Documents

masquerade detection

masquerade test data

masquerade detection5

user data

training data

schonlau data set schonlau

large data set

nave bayes nave bayes