masquerade detection mark stamp 1masquerade detection
TRANSCRIPT
Masquerade Detection 1
Masquerade Detection
Mark Stamp
Masquerade Detection 2
Masquerade Detection
Masquerader --- someone who makes unauthorized use of a computer
How to detect a masquerader? Here, we consider…
Anomaly-based intrusion detection (IDS) Detection is based on UNIX commands
Lots and lots of prior work on this problem We attempt to apply PHMMs
For comparison, we also implement other techniques (HMM and N-gram)
Masquerade Detection 3
Schonlau Data Set
Schonlau, et al, collected large data set Contains UNIX commands for 50 users
50 files, one for each user Each file has 15k commands, 5k from user
plus 10k for masquerade test data Test data: 100 blocks, 100 commands each
Dataset includes map file 100 rows (test blocks), 50 columns (users) 0 if block is user data, 1 if masquerade data
Masquerade Detection 4
Schonlau Data Set
Map file structure
This data set used for many studies Approximately, 50 published papers
Masquerade Detection 5
Previous Work
Approaches to masquerade detection Information theoretic Text mining Hidden Markov models (HMM) Naïve Bayes Sequences and bioinformatics Support vector machines (SVM) Other approaches
We briefly look at each of these
Masquerade Detection 6
Information Theoretic
Original work by Schonlau included a compression technique
Based on theory (hope?) that legitimate commands compress more than attack
Results were disappointing Some additional recent work
Still not competitive with best approaches
Masquerade Detection 7
Text Mining
A few papers in this area One approach extracts repetitive
sequences from training data Another paper use principal
component analysis (PCA) Method of “exploratory data analysis” Good results on Schonlau data set But high cost during training phase
Masquerade Detection 8
Hidden Markov Models
Several authors have used HMMs One of the best known approaches
We have implemented HMM detector We do sensitivity analysis on the
parameters In particular, determine optimal N
(number of hidden states) We also use HMMs for comparison
with our PHMM results
Masquerade Detection 9
Naïve Bayes
In simplest form, relies only on command frequencies That is, no sequence info is used
Several papers analyze this approach Among the simplest approaches And, results are good
Masquerade Detection 10
Sequences
In a sense, this is the opposite extreme from naïve Bayes Naïve Bayes only considers frequency
stats Sequence/bioinformatics focused on
sequence-related information Schonlau’s original work included
elementary sequence-based analysis
Masquerade Detection 11
Bioinformatics
We are aware of only one previous paper that uses bioinformatics approach Use Smith-Waterman algorithm to create
local alignments Alignments then used directly for detection
In contrast, we do pairwise alignments, MSA, PHMM PHMM is used for scoring (forward algorithm) Our scoring is much more efficient Also, our results are at least as strong
Masquerade Detection 12
Support Vector Machines
Support vector machines (SVM) Machine learning technique Separate data points (i.e., classify) based
on hyperplanes in high dimensional space Original data mapped to higher dimension,
where separation is likely easier SVMs maximize separation
And have low computational costs Used for classification and regression
analysis
Masquerade Detection 13
SVMs & Masquerade Detection
SVMs have been applied to masquerade detection problem
Results are good Comparable to naïve Bayes
Recent work using SVMs focused on improved efficiency
Masquerade Detection 14
Other Approaches
The following have also been studied Detect using low frequency commands Detect using high frequency commands Hybrid Bayes “one step Markov”
Natural to consider hybrid approaches Multistep Markov
Markov process of order greater than 1
None of these particularly successful
Masquerade Detection 15
Other Approaches (Continued)
Non-negative matrix factorization (NMF) At least 2 papers on this topic Appears to be competitive
Other hybrids that attempt to combine several approaches So far, no significant improvement over
individual techniques
Masquerade Detection 16
HMMs
See previous presentation
Masquerade Detection 17
HMM for Masquerade Detection
Using the Schonlau data set we… Train HMM for each user Set thresholds Test the models and plot results
Note that this has been done before Here, we perform sensitivity analysis
That is, we test different number of hidden states, N
Also use it for comparison with PHMM
Masquerade Detection 18
HMM Experiments
Plotted as “ROC” curves Closer to
origin is better
Useful region That is, false
positives below 5%
The shaded region
Masquerade Detection 19
HMM Conclusion
Number of hidden states does not matter
So, use N=2 Since most
efficient
Masquerade Detection 20
PHMM
See previous presentation
Masquerade Detection 21
PHMM Experiments
A problem with Schonlau data… For given user, 5000 commands
No begin/end session markers So, must split it up to obtain multiple
sequences But where to split sequence? And what about tradeoff between number of
sequences and length of each sequence? That is, how to decide length/number???
Masquerade Detection 22
PHMM Experiments
Experiments done for following cases:
See next slide…
Masquerade Detection 23
PHMM Experiments
Tests various numbers of sequences
Best results 5 sequences,
1k commands each seq.
This case in next slide
Masquerade Detection 24
PHMM Comparison
Compare PHMM to “weighted N-gram” and HMM
HMM is best PHMM is
competitive
Masquerade Detection 25
PHMM Detector
PHMM at disadvantage on Schonlau data PHMM uses positional information Such info not available for Schonlau data We have to guess the positions for PHMM
How to get fairer comparison between HMM and PHMM? We need different data set
Only option is simulated data set
Masquerade Detection 26
Simulated Data
We generate simulated data as follows Using Schonlau data, construct Markov
chain for each user Use resulting Markov chain to generate
sequences representing user behavior Restrict “begin” to more common
commands What’s the point?
Simulated seqs have sensible begin and end
Masquerade Detection 27
Simulated Data
Training data and user data for scoring generated using Markov chain
Attack data taken from Schonlau data How much data to generate? First test, we generate same amount of
simulated data as is in Schonlau set That is, 5k commands per user
Masquerade Detection 28
Detection with Simulated Data
PHMM vs HMM Round 2
It’s close, but HMM still wins!
Masquerade Detection 29
Limited Training Data
What if less training data is available? In a real application, initially, training
data is limited Can’t detect attacks until sufficient
training data has been accumulated So, less data required, the better
Experiments, using simulated data, limited training date Used 200 to 800 commands for training
Masquerade Detection 30
Limited Training Data
PHMM vs HMM Round 3
With 400 or less, PHMM wins big!
Masquerade Detection 31
Conclusion
PHMM is competitive with best approaches
PHMM likely to do better, given better training data (begin/end info)
PHMM much better than HMM when limited training data available Of practical importance Why does it make sense that PHMM would
do better with limited training data?
Masquerade Detection 32
Conclusion
Given current state of research… Optimal masquerade detection
approach Initially, collect small training set Train PHMM and use for detection No attack, then continue to collect data When sufficient data available, train
HMM From then on, use HMM for detection
Masquerade Detection 33
Future Work
Collect better real data set!!! Many problems/limitations with Schonlau
data Improved data set could be basis for
lots and lots of research Directly compare PHMM/bioinformatics
approaches with previous work (HMM, naïve Bayes, SVM, etc., etc.)
Consider hybrid techniques Other techniques?
Masquerade Detection 34
References
Masquerade detection using profile hidden Markov models, L. Huang and M. Stamp, to appear in Computers and Security
Masquerading user data, M. Schonlau