masquerade detection mark stamp 1masquerade detection

34
Masquerade Detection Mark Stamp 1 Masquerade Detection

Upload: marybeth-harrell

Post on 26-Dec-2015

226 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 1

Masquerade Detection

Mark Stamp

Page 2: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 2

Masquerade Detection

Masquerader --- someone who makes unauthorized use of a computer

How to detect a masquerader? Here, we consider…

Anomaly-based intrusion detection (IDS) Detection is based on UNIX commands

Lots and lots of prior work on this problem We attempt to apply PHMMs

For comparison, we also implement other techniques (HMM and N-gram)

Page 3: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 3

Schonlau Data Set

Schonlau, et al, collected large data set Contains UNIX commands for 50 users

50 files, one for each user Each file has 15k commands, 5k from user

plus 10k for masquerade test data Test data: 100 blocks, 100 commands each

Dataset includes map file 100 rows (test blocks), 50 columns (users) 0 if block is user data, 1 if masquerade data

Page 4: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 4

Schonlau Data Set

Map file structure

This data set used for many studies Approximately, 50 published papers

Page 5: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 5

Previous Work

Approaches to masquerade detection Information theoretic Text mining Hidden Markov models (HMM) Naïve Bayes Sequences and bioinformatics Support vector machines (SVM) Other approaches

We briefly look at each of these

Page 6: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 6

Information Theoretic

Original work by Schonlau included a compression technique

Based on theory (hope?) that legitimate commands compress more than attack

Results were disappointing Some additional recent work

Still not competitive with best approaches

Page 7: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 7

Text Mining

A few papers in this area One approach extracts repetitive

sequences from training data Another paper use principal

component analysis (PCA) Method of “exploratory data analysis” Good results on Schonlau data set But high cost during training phase

Page 8: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 8

Hidden Markov Models

Several authors have used HMMs One of the best known approaches

We have implemented HMM detector We do sensitivity analysis on the

parameters In particular, determine optimal N

(number of hidden states) We also use HMMs for comparison

with our PHMM results

Page 9: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 9

Naïve Bayes

In simplest form, relies only on command frequencies That is, no sequence info is used

Several papers analyze this approach Among the simplest approaches And, results are good

Page 10: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 10

Sequences

In a sense, this is the opposite extreme from naïve Bayes Naïve Bayes only considers frequency

stats Sequence/bioinformatics focused on

sequence-related information Schonlau’s original work included

elementary sequence-based analysis

Page 11: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 11

Bioinformatics

We are aware of only one previous paper that uses bioinformatics approach Use Smith-Waterman algorithm to create

local alignments Alignments then used directly for detection

In contrast, we do pairwise alignments, MSA, PHMM PHMM is used for scoring (forward algorithm) Our scoring is much more efficient Also, our results are at least as strong

Page 12: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 12

Support Vector Machines

Support vector machines (SVM) Machine learning technique Separate data points (i.e., classify) based

on hyperplanes in high dimensional space Original data mapped to higher dimension,

where separation is likely easier SVMs maximize separation

And have low computational costs Used for classification and regression

analysis

Page 13: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 13

SVMs & Masquerade Detection

SVMs have been applied to masquerade detection problem

Results are good Comparable to naïve Bayes

Recent work using SVMs focused on improved efficiency

Page 14: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 14

Other Approaches

The following have also been studied Detect using low frequency commands Detect using high frequency commands Hybrid Bayes “one step Markov”

Natural to consider hybrid approaches Multistep Markov

Markov process of order greater than 1

None of these particularly successful

Page 15: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 15

Other Approaches (Continued)

Non-negative matrix factorization (NMF) At least 2 papers on this topic Appears to be competitive

Other hybrids that attempt to combine several approaches So far, no significant improvement over

individual techniques

Page 16: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 16

HMMs

See previous presentation

Page 17: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 17

HMM for Masquerade Detection

Using the Schonlau data set we… Train HMM for each user Set thresholds Test the models and plot results

Note that this has been done before Here, we perform sensitivity analysis

That is, we test different number of hidden states, N

Also use it for comparison with PHMM

Page 18: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 18

HMM Experiments

Plotted as “ROC” curves Closer to

origin is better

Useful region That is, false

positives below 5%

The shaded region

Page 19: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 19

HMM Conclusion

Number of hidden states does not matter

So, use N=2 Since most

efficient

Page 20: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 20

PHMM

See previous presentation

Page 21: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 21

PHMM Experiments

A problem with Schonlau data… For given user, 5000 commands

No begin/end session markers So, must split it up to obtain multiple

sequences But where to split sequence? And what about tradeoff between number of

sequences and length of each sequence? That is, how to decide length/number???

Page 22: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 22

PHMM Experiments

Experiments done for following cases:

See next slide…

Page 23: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 23

PHMM Experiments

Tests various numbers of sequences

Best results 5 sequences,

1k commands each seq.

This case in next slide

Page 24: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 24

PHMM Comparison

Compare PHMM to “weighted N-gram” and HMM

HMM is best PHMM is

competitive

Page 25: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 25

PHMM Detector

PHMM at disadvantage on Schonlau data PHMM uses positional information Such info not available for Schonlau data We have to guess the positions for PHMM

How to get fairer comparison between HMM and PHMM? We need different data set

Only option is simulated data set

Page 26: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 26

Simulated Data

We generate simulated data as follows Using Schonlau data, construct Markov

chain for each user Use resulting Markov chain to generate

sequences representing user behavior Restrict “begin” to more common

commands What’s the point?

Simulated seqs have sensible begin and end

Page 27: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 27

Simulated Data

Training data and user data for scoring generated using Markov chain

Attack data taken from Schonlau data How much data to generate? First test, we generate same amount of

simulated data as is in Schonlau set That is, 5k commands per user

Page 28: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 28

Detection with Simulated Data

PHMM vs HMM Round 2

It’s close, but HMM still wins!

Page 29: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 29

Limited Training Data

What if less training data is available? In a real application, initially, training

data is limited Can’t detect attacks until sufficient

training data has been accumulated So, less data required, the better

Experiments, using simulated data, limited training date Used 200 to 800 commands for training

Page 30: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 30

Limited Training Data

PHMM vs HMM Round 3

With 400 or less, PHMM wins big!

Page 31: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 31

Conclusion

PHMM is competitive with best approaches

PHMM likely to do better, given better training data (begin/end info)

PHMM much better than HMM when limited training data available Of practical importance Why does it make sense that PHMM would

do better with limited training data?

Page 32: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 32

Conclusion

Given current state of research… Optimal masquerade detection

approach Initially, collect small training set Train PHMM and use for detection No attack, then continue to collect data When sufficient data available, train

HMM From then on, use HMM for detection

Page 33: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 33

Future Work

Collect better real data set!!! Many problems/limitations with Schonlau

data Improved data set could be basis for

lots and lots of research Directly compare PHMM/bioinformatics

approaches with previous work (HMM, naïve Bayes, SVM, etc., etc.)

Consider hybrid techniques Other techniques?

Page 34: Masquerade Detection Mark Stamp 1Masquerade Detection

Masquerade Detection 34

References

Masquerade detection using profile hidden Markov models, L. Huang and M. Stamp, to appear in Computers and Security

Masquerading user data, M. Schonlau