boundary detection in tokenizing network application payload for anomaly detection rachna vargiya...

Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection

Rachna Vargiya and Philip Chan

Department of Computer Sciences

Florida Institute of Technology

Motivation

Existing anomaly detection techniques rely on information derived only from the packet headers

More sophisticated attacks involve the application payload

Example : Code Red II worm GET /default.ida?NNNNNNNNN…

Parsing the payload is required! Problems in hand-coded parsing:

Large number of application protocols Frequent introduction of new protocols

Problem Statement

To parse application payload into tokens without explicit knowledge of the application protocols

These tokens are later used as features for anomaly detection

Related work

Pattern Detection - Important Tokens Fixed Length:

Forrest et al. (1998) Variable Length:

Wespi et al. (2000) Jiang et al.(2002)

Boundary Detection – All Tokens VOTING EXPERTS by Cohen et al. (2002)

Boundary Entropy Frequency Binary Votes

Approach

Boundary Finding Algorithms: Boundary Entropy Frequency Augmented Expected Mutual Information Minimum Description Length

Approach is domain independent (no prior domain knowledge)

Combining Boundary Finding Algorithms

Combination of all or a subset (E.g. Frequency + Minimum Description Length) of techniques

Each algorithm can cast multiple votes, depending on confidence measure

Boundary Entropy (Cohen et al)

Entropy at the end of each possible window is calculated

)|(log)|( wxPwxPItisarainyday

X

‘x’ is the byte following the current window

High Entropy means more variation

w

Voting using Boundary Entropy change graph to discrete bars

Entropy in meaningful tokens starts with a high value, drops, and peaks at the end

Vote for positions with the peak entropy Threshold suppresses votes for low

entropy values Threshold = Average BE

Itisarainyday

Frequency (Cohen et al) Most frequent set of tokens are assumed to be

meaningful tokens Frequencies of tokens with length =1, 2, 3…., 6 Shorter tokens are inherently more frequent than

longer tokens Normalize frequencies for tokens of the same

length using standard deviation Boundaries are assigned at the end of most

frequent token in the window

arainydayItisFrequency in window:

(1)”I” = 3 (2)”It” = 5 (3) “Iti” = 2 (4)”It is” = 3

Mutual Information (MI)

Mutual Information given by:

Gives us the reduction of uncertainty in presence of event ‘b’ given event ‘a’

MI does not incorporate the counter evidence when ‘a’ occurs without ‘b’ and vice versa

))]()(/(),(lg[),( bPaPbaPbaMI

Augmented Expected Mutual Information(AEMI)

•AEMI sums the supporting evidence and subtracts the counter evidence •For each window, the location with the minimum AEMI value suggests a boundary

b)b)MI(a,P(a,b)a,b)MI(a,P( ),(),(),( baMIbaPBAAEMI

Itisarainydaya b

Minimum Description Length(MDL)

Shorter code assigned to frequent tokens to minimize the overall coding length

Boundary yielding shortest coding length is assigned votes

Coding Length per byte: Lg P(ti): no of bits to encode ti

|ti|=length of ti

||/)(lg},{

irightlefti

i ttPMDL

Itisarainydaytleft tright

Normalize scores of each algorithm

Each algorithm produces list of scores

Since the number of votes is proportional to the score, the scores must be normalized

Each score is replaced by the number of standard deviations that the score is away from the mean value

Normalize votes of each algorithm

Algorithms produce list of votes depending on the scores

Make sure each algorithm votes with the same weight.

Number of votes is replaced by the number of standard deviations from the mean value

I t I s

s1 s2 s3 s4

ns1 ns2 ns3 ns4

v2 v3 v4v1

I t I s

s1 s2 s3 s4

ns1 ns2 ns3 ns4

nv1

v2 v3 v4v1

nv1 nv1 nv1

Scores

Normalized scores

Votes

nv1nv1 nv1 nv1

Normalizing Scores and Votes

Combined Normalized Votes

Combined Approach with Weighted Voting

A list of votes from all the experts is gathered

For each boundary, the final votes are summed

A boundary is placed at a position if the votes at the position exceed threshold.

Threshold = Average number of Votes

Evaluation Criteria

Evaluation A: % of space separated words retrieved

Evaluation B: % of keywords in the protocol specification that were retrieved

Evaluation C: entropy of the tokens in output file (lower the better)

Evaluation D: number of detected attacks in network traffic

A and B only for text based protocols

Anomaly Detection Algorithm – LERAD (Mahoney and Chan)

LERAD forms rules based on 23 attributes First 15 attributes: from packet header Next 8 attributes: from the payload Example Rule:

If port = 80 then word1 = “GET”

Original Payload attributes: space separated tokens

Our Payload attributes: Boundary separated tokens

Experimental Data

1999 DARPA Intrusion Detection Evaluation Data Set Week 3 :attack free (training) data Weeks 4, 5: attack containing (test) data Evaluations A, B, C (Known boundaries) : Week 3

trained: days 1 - 4 tested: days 5 – 7 Prevent gaining knowledge from Weeks 4 and 5

Evaluation D (Detected attacks) Trained: Week 3 Tested :Weeks 4 and 5

Evaluation A: % of Space-Separated Tokens Recovered

Method Port# 25

Port# 80

Port# 21

Port# 79

Avg

Freq+MDL 52 26 21 81 45.0

Frequency 15 16 13 99 36.0

BE + AEMI + MDL+ Freq

21 14 5 12 13.0

AEMI 5 9 4 32 12.5

MDL 6 7 3 25 10.3

BE 3 3 1 9 4.0

Evaluation B: % of Keywords in RFCs Recovered

Method Port#25 Port#80 Port#21 Avg

Freq+MDL 40 36 59 45.0

Frequency 31 28 40 33.0

BE+AEMI+MDL+Freq

12 13 21 15.3

AEMI 9 5 2 5.3

MDL 7 6 1 4.7

BE 3 2 2 2.3

Evaluation C: Entropy of Output(Lower is Better)

average across 6 ports

Method Average Value

Frequency 5.0

MDL 5.03

Freq+MDL 5.06

BE 5.25

BE + AEMI + Freq + MDL 5.56

AEMI 6.38

Ranking of Algorithms

Method Evaluation A Evaluation B Evaluation C

Freq+MDL 1 1 3

Frequency 2 2 1

BE+AEMI+MDL+ Freq

3 3 5

AEMI 4 4 6

MDL 5 5 2

BE 6 6 4

Detection Rate for Space Separated Vs Boundary Separated (Freq + MDL)

Port # 10 FP/day

Space Boundary

100 FP/day

Space Boundary

20 2 2 4 5

21 14 16 14 17

22 3 3 3 3

23 13 14 13 14

25 15 16 16 16

79 3 3 3 3

80 10 10 11 13

113 2 2 2 2

Overall 59 62 63 68

% Improvement -- 5 -- 8

Summary of Contributions

Used payload information, while most IDS concentrate on header information.

Proposed AEMI + MDL for boundary detection Combined all and subset of algorithms Used weighted voting to indicate confidence Proposed techniques find boundaries better than

spaces Achieved higher detection rates in an anomaly

detection system

Future Work

Further evaluation on other ports

Pick more useful tokens instead of first 8

DARPA data set is partially synthetic, further evaluation on real traffic

Evaluation with other Anomaly detection algorithms

Thank you

Experimental Results

Table 4.3.4 Results from Additional Ports for Freq + MDL and ALL

Method Evaluation A

% Words Found

Evaluation B

% Keywords Found

EvaluationEntropy

Frq+MDL

ALL Frq+MDL

ALL Frq+MDL

ALL

23 13 7 5 3 7.88 8.08

115 43 20 - - 4.45 5.18

515 38 14 - - 7.66 7.27

boundary detection in tokenizing network application payload for anomaly detection rachna vargiya...

Documents