north carolina state university columbia university florida institute of technology

A Data Mining Approach for Building Cost-Sensitive and Light Intrusion

Detection Models

PI Meeting - July, 2000

North Carolina State University

Columbia University

Florida Institute of Technology

Overview

• Project description

• Progress report:– correlation– cost-sensitive modeling– anomaly detection– collaboration with industry

• Plan of work for 2000-2001

New Ideas/Hypotheses

• High-volume automated attacks can overwhelm an IDS and its staff.

• Use cost-sensitive data mining algorithms to construct ID models that consider cost factors:– damage cost, response cost, operational cost, etc.

• Multiple specialized and light ID models can be dynamically activated/configured in run-time

• Cost-effectiveness as the guiding principle and multi-model correlation as the architectural approach .

Impact

• A better understanding of the cost factors, cost models, and cost metrics related to intrusion detection.

• Modeling techniques and deployment strategies for cost-effective IDSs.

• “Clustering” techniques for grouping intrusions and building specialized and light models.

• An architecture for dynamically activating, configuring, and correlating ID models.

Correlation: Model and Issues

• “Good” base models: data sources and modeling techniques.

• The combined model: the correlation algorithms and network topology.

across sources

across time/sources

Correlation: Approaches

• Extend previous work in JAM • A sequence of time-stamped records

– each is composed of signals from multiple sensors (network topology information embedded);

• Apply data mining techniques to learn how to correlate the signals to generate a combined sensor: – link analysis, sequence analysis, machine learning

(classification), etc.

Correlation: Integrating NM and ID Signals

• A stream of measures (anomaly reports) on MIB variables of network elements and a stream of ID signals:– Better coverage;– Early sensing of attacks.

• Normal measures of network traffics and parameter values of ID signatures– S = f(N, A), A is invariant then S=g(N).

– Automatic parameter adjustment, S1=g(N1).

Cost Factors of IDSs• Attack taxonomy: result/target/technique

• Development cost

• Damage cost (DCost)– The amount of damage when ID is not available or

ineffective.

• Response cost (RCost)– The cost of acting upon an alarm of potential intrusion.

• Operational cost (OpCost)– The cost of processing and analyzing audit data ;

– Mainly the computational costs of the features.

Cost Models of IDSs

• The total cost of an IDS over a set of events:

• CumulativeCost(E) = eE (CCost(e) + OpCost(e))

• CCost(e), the consequential cost, depends on prediction on event e

Consequential Cost (CCost)

• For event e :

Outcome CCost(e) Conditions Miss (FN) DCost(e) False Alarm (FP) RCost(e’)+PCost(e) DCost(e’) RCost(e’) 0 Otherwise Hit (TP) RCost(e)+DCost(e) DCost(e) RCost(e) DCost(e) Otherwise Normal (TN) 0 Misclassified Hit RCost(e’)+DCost(e) DCost(e’) RCost(e’) DCost(e) Otherwise

Cost-sensitive Modeling: Objectives

• Reducing operational costs:– Use cheap features in ID models.

• Reducing consequential costs:– Do not respond to an intrusion if RCost >

DCost.

Cost-sensitive Modeling: Approaches

• Reducing operational costs:– A multiple-model approach:

• Build multiple rule-sets, each with features of different cost levels;

• Use cheaper rule-sets first, costlier ones later only for required accuracy.

– Feature-Cost-Sensitive Rule Induction:• Search heuristic considers information gain AND

feature cost.

Cost-sensitive Modeling: Approaches (continued)

• Reducing consequential costs:– MetaCost:

• Purposely re-label intrusions with Rcost > DCost as normal.

– Post-Detection decision: • Action depends on comparison of RCost and DCost.

Latest Results• OpCost

– Compare the multiple-model approach with single-model approach;

– rdc%: (single - multiple)/single;

– range: 57% to 79%.

0

50

100

150

200

250

Average Per Connection

Single

Multiple

Latest Results (continued)

• CCost using a post-detection cost-sensitive decision module– rdc% range: 75% to 95%;

– Compared with single model: slightly better rdc%;

– Compared with cost-insensitive models: 25% higher rdc%.

23000

23500

24000

24500

25000

25500

26000

26500

27000

27500

Total Ccost

CS-single

CS-multiple

CI-single

CI-Multiple

Anomaly Detection

• Unsupervised Training Methods– Build models over noisy (not clean) data

• Artificial Anomalies– Improves performance of anomaly detection

methods.

• Combining misuse and anomaly detection.

AD over Noisy Data

• Builds normal models over data containing some anomalies.

• Motivating Assumptions: – Intrusions are extremely rare compared to to

normal.

– Intrusions are quantitatively different.

Approach Overview

• Mixture Model– Normal Component– Anomalous Component

• Build Probabilistic Model of Data

• Max Likelihood test for detection.

Mixture Model of Anomalies

• Assume a generative model: The data is generated with a probability distribution D.

• Each element originates from one of two components.– M, the Majority Distribution (x M).– A, the Anomalous Distribution (x A).

• Thus: D = (1-)M + A

Modeling Probability Distributions

• Train Probability Distributions over current sets of M and A.

• PM(X) = probability distribution for Majority

• PA(X) = probability distribution for Anomaly

• Any probability modeling method can be used: Naïve Bayes, Max Entropy, etc.

• Likelihood of a partition of the set of all elements D into M and A:

L(D)= PD(X)

=((1-)|M| PM(X) )( |A| PA(X))

• Log Likelihood (for computational reasons):

LL(D)=log(L(D))

Detecting Anomalies

M A

D

Algorithm for Detection

• Assume all elements are normal (M0=D, A0= ).

• Compute PD(X).

• Using PD(X) compute LL(D).

• For each element compute difference in LL(D) if removed from M and inserted into A.

• If the difference is large enough, then declare the element an anomaly.

Evaluating xt

Mt+1 = Mt – {xt}

At+1 = At U {xt}

Recompute PMt and PAt. (efficiently)

If (LLt+1-LLt)> threshold, xt is anomaly

Otherwise xt is normal

Experiments

• Two Sets of experiments:– Measured Performance against comparison

methods over noisy data.– Measured Performance trained over noisy data

against comparison methods trained over clean data.

AD Using Artificial Anomalies• Generate abnormal behavior artificially

– assume the given normal data are representative– “near misses” of normal behavior is considered

abnormal– change the value of only one feature in an instance of

normal behavior– sparsely represented values are sampled more frequently– “near misses” help define a tight boundary enclosing the

normal behavior

Experimental Results• Learning algorithm: RIPPER rule learner.• Data: 1998/99 DARPA evaluation

– U2R, R2L, DOS, PRB: 22 “clusters”• Training data: normal and artificial anomalies• Results

– Overall hit rate: 94.26% (correctly normal or intrusion)– Overall false alarm rate: 2.02%– 100% dectection: buffer_overflow, guess_passwd, phf, back– 0% detection: perl, spy, teardrop, ipsweep, nmap– 50+% detection: 13 out of 22 intrusion subclasses

Combining Anomaly And Misuse Detection

• Training data: normal, artificially generated anomalies, known intrusions

• The learned model can predict normal, anomaly, or known intrusion subclass

• Experiments were performed on increasing subsets of known intrusion subclasses in the training data (simulates identified intrusions over time).

Combining Anomaly And Misuse Detection (continued)

• Consider phf, pod, teardrop, spy, and smurf are unknown (absent from the training data)

• Anomaly detection rate: phf=25%, pod=100%, teardrop=93.91%, spy=50%, smurf=100%

• Overall false alarm rate: .20%

• The false alarm rate has dropped from 2.02% to .20% when some known attacks are included for training

Collaboration with Industry

• RST Inc.– Anomaly detection on NT systems

• NFR Inc.– real-time IDS

• SAS Institute– off-line ID (funded by SAS)

• Aprisma (Cabletron)– Integrating ID with NM (funded by Aprisma)

• HRL Labs– ID in wireless networks (funded by HRL)

Plan for 2000-2001• Dynamic cost-sensitive modeling and

deployment– work with industry for realistic cost analysis and

real-time testing

• Anomaly detection– improve existing algorithms using feedback from

evaluation

• Correlation– develop/evaluate algorithms for integrating

multiple sources data/evidences

north carolina state university columbia university florida institute of technology

Documents

response cost rcostthe

cost metrics

building costsensitive

light id models

light models

stream of id signals

modeling techniques

data mining approach