north carolina state university columbia university florida institute of technology

30
A Data Mining Approach for Building Cost-Sensitive and Light Intrusion Detection Models PI Meeting - July, 2000 North Carolina State University Columbia University Florida Institute of Technology

Upload: jadon

Post on 31-Jan-2016

17 views

Category:

Documents


0 download

DESCRIPTION

A Data Mining Approach for Building Cost-Sensitive and Light Intrusion Detection Models PI Meeting - July, 2000. North Carolina State University Columbia University Florida Institute of Technology. Overview. Project description Progress report: correlation cost-sensitive modeling - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: North Carolina State University Columbia University Florida Institute of Technology

A Data Mining Approach for Building Cost-Sensitive and Light Intrusion

Detection Models

PI Meeting - July, 2000

North Carolina State University

Columbia University

Florida Institute of Technology

Page 2: North Carolina State University Columbia University Florida Institute of Technology

Overview

• Project description

• Progress report:– correlation– cost-sensitive modeling– anomaly detection– collaboration with industry

• Plan of work for 2000-2001

Page 3: North Carolina State University Columbia University Florida Institute of Technology

New Ideas/Hypotheses

• High-volume automated attacks can overwhelm an IDS and its staff.

• Use cost-sensitive data mining algorithms to construct ID models that consider cost factors:– damage cost, response cost, operational cost, etc.

• Multiple specialized and light ID models can be dynamically activated/configured in run-time

• Cost-effectiveness as the guiding principle and multi-model correlation as the architectural approach .

Page 4: North Carolina State University Columbia University Florida Institute of Technology

Impact

• A better understanding of the cost factors, cost models, and cost metrics related to intrusion detection.

• Modeling techniques and deployment strategies for cost-effective IDSs.

• “Clustering” techniques for grouping intrusions and building specialized and light models.

• An architecture for dynamically activating, configuring, and correlating ID models.

Page 5: North Carolina State University Columbia University Florida Institute of Technology

Correlation: Model and Issues

• “Good” base models: data sources and modeling techniques.

• The combined model: the correlation algorithms and network topology.

across sources

across time/sources

Page 6: North Carolina State University Columbia University Florida Institute of Technology

Correlation: Approaches

• Extend previous work in JAM • A sequence of time-stamped records

– each is composed of signals from multiple sensors (network topology information embedded);

• Apply data mining techniques to learn how to correlate the signals to generate a combined sensor: – link analysis, sequence analysis, machine learning

(classification), etc.

Page 7: North Carolina State University Columbia University Florida Institute of Technology

Correlation: Integrating NM and ID Signals

• A stream of measures (anomaly reports) on MIB variables of network elements and a stream of ID signals:– Better coverage;– Early sensing of attacks.

• Normal measures of network traffics and parameter values of ID signatures– S = f(N, A), A is invariant then S=g(N).

– Automatic parameter adjustment, S1=g(N1).

Page 8: North Carolina State University Columbia University Florida Institute of Technology

Cost Factors of IDSs• Attack taxonomy: result/target/technique

• Development cost

• Damage cost (DCost)– The amount of damage when ID is not available or

ineffective.

• Response cost (RCost)– The cost of acting upon an alarm of potential intrusion.

• Operational cost (OpCost)– The cost of processing and analyzing audit data ;

– Mainly the computational costs of the features.

Page 9: North Carolina State University Columbia University Florida Institute of Technology

Cost Models of IDSs

• The total cost of an IDS over a set of events:

• CumulativeCost(E) = eE (CCost(e) + OpCost(e))

• CCost(e), the consequential cost, depends on prediction on event e

Page 10: North Carolina State University Columbia University Florida Institute of Technology

Consequential Cost (CCost)

• For event e :

Outcome CCost(e) Conditions Miss (FN) DCost(e) False Alarm (FP) RCost(e’)+PCost(e) DCost(e’) RCost(e’) 0 Otherwise Hit (TP) RCost(e)+DCost(e) DCost(e) RCost(e) DCost(e) Otherwise Normal (TN) 0 Misclassified Hit RCost(e’)+DCost(e) DCost(e’) RCost(e’) DCost(e) Otherwise

Page 11: North Carolina State University Columbia University Florida Institute of Technology

Cost-sensitive Modeling: Objectives

• Reducing operational costs:– Use cheap features in ID models.

• Reducing consequential costs:– Do not respond to an intrusion if RCost >

DCost.

Page 12: North Carolina State University Columbia University Florida Institute of Technology

Cost-sensitive Modeling: Approaches

• Reducing operational costs:– A multiple-model approach:

• Build multiple rule-sets, each with features of different cost levels;

• Use cheaper rule-sets first, costlier ones later only for required accuracy.

– Feature-Cost-Sensitive Rule Induction:• Search heuristic considers information gain AND

feature cost.

Page 13: North Carolina State University Columbia University Florida Institute of Technology

Cost-sensitive Modeling: Approaches (continued)

• Reducing consequential costs:– MetaCost:

• Purposely re-label intrusions with Rcost > DCost as normal.

– Post-Detection decision: • Action depends on comparison of RCost and DCost.

Page 14: North Carolina State University Columbia University Florida Institute of Technology

Latest Results• OpCost

– Compare the multiple-model approach with single-model approach;

– rdc%: (single - multiple)/single;

– range: 57% to 79%.

0

50

100

150

200

250

Average Per Connection

Single

Multiple

Page 15: North Carolina State University Columbia University Florida Institute of Technology

Latest Results (continued)

• CCost using a post-detection cost-sensitive decision module– rdc% range: 75% to 95%;

– Compared with single model: slightly better rdc%;

– Compared with cost-insensitive models: 25% higher rdc%.

23000

23500

24000

24500

25000

25500

26000

26500

27000

27500

Total Ccost

CS-single

CS-multiple

CI-single

CI-Multiple

Page 16: North Carolina State University Columbia University Florida Institute of Technology

Anomaly Detection

• Unsupervised Training Methods– Build models over noisy (not clean) data

• Artificial Anomalies– Improves performance of anomaly detection

methods.

• Combining misuse and anomaly detection.

Page 17: North Carolina State University Columbia University Florida Institute of Technology

AD over Noisy Data

• Builds normal models over data containing some anomalies.

• Motivating Assumptions: – Intrusions are extremely rare compared to to

normal.

– Intrusions are quantitatively different.

Page 18: North Carolina State University Columbia University Florida Institute of Technology

Approach Overview

• Mixture Model– Normal Component– Anomalous Component

• Build Probabilistic Model of Data

• Max Likelihood test for detection.

Page 19: North Carolina State University Columbia University Florida Institute of Technology

Mixture Model of Anomalies

• Assume a generative model: The data is generated with a probability distribution D.

• Each element originates from one of two components.– M, the Majority Distribution (x M).– A, the Anomalous Distribution (x A).

• Thus: D = (1-)M + A

Page 20: North Carolina State University Columbia University Florida Institute of Technology

Modeling Probability Distributions

• Train Probability Distributions over current sets of M and A.

• PM(X) = probability distribution for Majority

• PA(X) = probability distribution for Anomaly

• Any probability modeling method can be used: Naïve Bayes, Max Entropy, etc.

Page 21: North Carolina State University Columbia University Florida Institute of Technology

• Likelihood of a partition of the set of all elements D into M and A:

L(D)= PD(X)

=((1-)|M| PM(X) )( |A| PA(X))

• Log Likelihood (for computational reasons):

LL(D)=log(L(D))

Detecting Anomalies

M A

D

Page 22: North Carolina State University Columbia University Florida Institute of Technology

Algorithm for Detection

• Assume all elements are normal (M0=D, A0= ).

• Compute PD(X).

• Using PD(X) compute LL(D).

• For each element compute difference in LL(D) if removed from M and inserted into A.

• If the difference is large enough, then declare the element an anomaly.

Page 23: North Carolina State University Columbia University Florida Institute of Technology

Evaluating xt

Mt+1 = Mt – {xt}

At+1 = At U {xt}

Recompute PMt and PAt. (efficiently)

If (LLt+1-LLt)> threshold, xt is anomaly

Otherwise xt is normal

Page 24: North Carolina State University Columbia University Florida Institute of Technology

Experiments

• Two Sets of experiments:– Measured Performance against comparison

methods over noisy data.– Measured Performance trained over noisy data

against comparison methods trained over clean data.

Page 25: North Carolina State University Columbia University Florida Institute of Technology

AD Using Artificial Anomalies• Generate abnormal behavior artificially

– assume the given normal data are representative– “near misses” of normal behavior is considered

abnormal– change the value of only one feature in an instance of

normal behavior– sparsely represented values are sampled more frequently– “near misses” help define a tight boundary enclosing the

normal behavior

Page 26: North Carolina State University Columbia University Florida Institute of Technology

Experimental Results• Learning algorithm: RIPPER rule learner.• Data: 1998/99 DARPA evaluation

– U2R, R2L, DOS, PRB: 22 “clusters”• Training data: normal and artificial anomalies• Results

– Overall hit rate: 94.26% (correctly normal or intrusion)– Overall false alarm rate: 2.02%– 100% dectection: buffer_overflow, guess_passwd, phf, back– 0% detection: perl, spy, teardrop, ipsweep, nmap– 50+% detection: 13 out of 22 intrusion subclasses

Page 27: North Carolina State University Columbia University Florida Institute of Technology

Combining Anomaly And Misuse Detection

• Training data: normal, artificially generated anomalies, known intrusions

• The learned model can predict normal, anomaly, or known intrusion subclass

• Experiments were performed on increasing subsets of known intrusion subclasses in the training data (simulates identified intrusions over time).

Page 28: North Carolina State University Columbia University Florida Institute of Technology

Combining Anomaly And Misuse Detection (continued)

• Consider phf, pod, teardrop, spy, and smurf are unknown (absent from the training data)

• Anomaly detection rate: phf=25%, pod=100%, teardrop=93.91%, spy=50%, smurf=100%

• Overall false alarm rate: .20%

• The false alarm rate has dropped from 2.02% to .20% when some known attacks are included for training

Page 29: North Carolina State University Columbia University Florida Institute of Technology

Collaboration with Industry

• RST Inc.– Anomaly detection on NT systems

• NFR Inc.– real-time IDS

• SAS Institute– off-line ID (funded by SAS)

• Aprisma (Cabletron)– Integrating ID with NM (funded by Aprisma)

• HRL Labs– ID in wireless networks (funded by HRL)

Page 30: North Carolina State University Columbia University Florida Institute of Technology

Plan for 2000-2001• Dynamic cost-sensitive modeling and

deployment– work with industry for realistic cost analysis and

real-time testing

• Anomaly detection– improve existing algorithms using feedback from

evaluation

• Correlation– develop/evaluate algorithms for integrating

multiple sources data/evidences