raidm: router-based anomaly/intrusion detection and mitigation zhichun li eecs deparment...

RAIDM: Router-based Anomaly/Intrusion Detection and Mitigation

Zhichun LiEECS Deparment

Northwestern University2008-04-29

Thesis Proposal

2

Outline

• Motivation

• RAIDM System Design

• Finished Work

• Proposed Work

• Research Plan

3

Motivation

Botnets

Worms

Attackers

http://images.google.com/imgres?imgurl=www.eirefirst.com/clipart/gifs/St%2520Patrick%2520Worm.gif&imgrefurl=http://www.eirefirst.com/clipart.htm&h=398&w=490&sz=8&tbnid=DaavbgCpeSEJ:&tbnh=103&tbnw=126&start=144&prev=/images%3Fq%3Dworm%26start%3D140%26hl%3Dko%26lr%3D%26ie%3DUTF-8%26sa%3DN

4

Motivation

• Network security has been recognized as the single most important attribute of their networks, according to survey to 395 senior executives conducted by AT&T.

• Many new emerging threats make the situation even worse.

RAIDM Network-based attack defense system

5

Network Level Defense

• Network gateways/routers are the vantage points for detecting large scale attacks

• Only host based detection/prevention is not enough for modern enterprise networks. – Enterprises might not only want to reply on

their end user for security protection– User might not want to stop their work to

reboot machines or applications for applying patches.

6

Outline

• Motivation


• Finished Work

• Proposed Work

• Research Plan

7

Research Questions

• How can we achieve online anomaly detection for high-speed networks?

• How can we respond to zero-day polymorphic worms in their early stage?

• Given vulnerabilities, how to protect the high-speed networks from exploits, accurately and efficiently?

• How can we provide quality information for network situational awareness?

8

System Framework

Content-based signature matching

Streaming packet data

Data path Control pathModules on the critical path

Token Based Signature Generation (TOSG)

Part IIPolymorphic worm signature generation

Modules on the non-critical path

Honeynets/Honeyfarms

Network Situational Awareness

Length Based Signature Generation (LESG)

Part IVNetwork Situational Awareness

To unused IPblocks

Protocol semantic signature matching

Part IIISignature matching engines

Reversiblek-ary sketch monitoring

Sketch based statistical anomaly detection (SSAD)

Local sketch records

Sent out for aggregation

Remote aggregatedsketchrecords Part I

Sketch-basedmonitoring & detection

9

Current Status

• Part I: Sketch based monitoring & detection– Result in [Infocom06,ToN,ICDCS06]

• Part II: Polymorphic worm signature generation– Result in [Oakland06,ICNP07]

• Part III: Signature matching engines– Work in progress, will be focus of this talk

• Part IV: Network Situational Awareness– Work in process

10

Outline

• Motivation


• Finished Work

• Proposed Work

• Research Plan

11

Part I: Sketch based monitoring & detection

• Reversible Sketches (include for completeness)– Use intelligent hash function design to recover the aggregated

value of a series (key,value) updates for the popular keys.

– Publications:– Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupt

a, Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reversible sketches: Enabling monitoring and analysis over high speed data streams, in the IEEE/ACM Transaction on Networking, Volume 15, Issue 5, Oct, 2007

– Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluations, and Applications, in the Proc. Of IEEE INFOCOM 2006 (252/1400=18%)

12

Part I: Sketch based monitoring & detection

• Sketch-based Anomaly Detection– Build anomaly detection engines based on

reversible sketches to detect horizontal scan, vertical scan, and TCP SYN flooding attacks.

– Further proposed 2D sketches to differentiate the different types of attacks.

– Publications– Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level I

ntrusion Detection Approach for High-speed Networks, In Proc. Of IEEE International Conference on Distributed Computing Systems (ICDCS) 2006 (75/536=14%) (Alphabetical order)

13

Part II: Polymorphic worm signature generation

• TOSG (Token-Based Signature Generation)– Use token (substring) conjunction as the signature for

polymorphic worms– Advantage

• Do not require protocol knowledge or the information about the vulnerable program

• Fast and noise tolerant• Have analytical attack resilience bound under certain assumptions.

– Limitation• Do not have good attack resilience to the deliberate noise injection

attack [Perdisci 2006]– Publication

Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao, Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience, in Proc. of IEEE Symposium on Security and Privacy, 2006 (23/251=9%)

14

Part II: Polymorphic worm signature generation

• LESG (Length-Based Signature Generation)– Propose to use a set of field lengths of the protocol of

vulnerable program as signatures. – Mainly work for buffer overflow worms– Advantage:

• Fast and noise tolerant• Have analytical attack resilience bound under certain

assumptions• The bound hold under all the recently proposed attacks.

– PublicationZhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorohic Worms, in the Proc. of IEEE International Conference on Network Protocols (ICNP) 2007 (32/220=14%)

15

Outline

• Motivation


• Finished Work

• Proposed Work

• Research Plan

16

Proposed Work

• Part III: Signature Matching Engine– NetShield, a protocol semantic vulnerability

signature matching engine. (focus on this talk)– Report

Zhichun Li, Gao Xia, Yi Tang, Ying He, Yan Chen and Bin Liu, NetShield : Towards High Performance Network-based Semantic Signature Matching

17

Proposed Work• Part IV: Network Situational Awareness

– Botnet Inference:• Infer scan properties based on honeynet traffic: trend, uniform,

hitlist, and collaboration• Extrapolate the global scan scope and global number of bots based

on limited local observation. Can be used to detect target attacks.• Report

Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson, Towards Situational Awareness of Large-Scale Botnet Events using Honeynets

– P2P Misconfiguration Diagnosis• Found P2P misconfiguration traffic is one of the major source of

Internet background radiation• eMule P2P misconfiguration is due to byte ordering• For BitTorrent, we found anti-P2P company deliberately inject bogus

peers• Report

Zhichun Li, Anup Goyal, Yan Chen and Aleksandar Kuzmanovic, P2P Doctor: Measurement and Diagnosis of Misconfigured Peer-to-Peer Traffic

18

NetShield Overview

• Goal

• Feasibility Study: a Measurement Approach

• High Speed Parsing

• High Speed Matching for Large Rulesets

• Preliminary Evaluation

• Discussion

19

Signature Matching Engine

• Accuracy (especially for IPS)– False positive– False negative

• Speed• Coverage: Large ruleset

Regular Expression

Vulnerability

Accuracy Poor Much Better

Speed Good Good

Coverage Good Good

20

Reason

Regular expression is not power enough

to capture the exact vulnerability condition!

Cannot express

exact condition

Can express

exact condition

REShield

X

21

Feasibility Study

• Protocol semantic can help (Shield project [SIGCOMM04])

• How much for NIDS/IPS?– Given a NIDS/NIPS has a large ruleset– What percent of the rules can use protocol

semantic vulnerability signature to improve?

22

Measure Snort rules

• Semi-manually classify the rules.– First by CVEID – Manually look at each vulnerability

• Results– 86.7% of rules can be improved by protocol semantic

vulnerability signatures. – 9.9% of rules are web DHTML and scripts related

which are not suitable for signature based approach. – On average 4.5 Snort rules reduce to one vulnerability

signature– Binary protocols have large reduction ratio than text

based protocols.

23

Towards high speed parsing

• Protocol parsing problem formulation– Given a PDU and the previous states from

previous PDU, output the set of fields which required by matching.

• Observation

• Parsing State Machine

24

Observation

array

PDU• PDU parse tree

• Leaf nodes (basic fields ) are integer or string

• Vulnerability signature mostly based on basic fields

Only need to parse out the field related to signatures

25

Parsing State Machine

• Studied eight popular protocols: HTTP, FTP, SMTP, eMule, BitTorrent, WINRPC, SNMP and DNS and vulnerability signatures.

• Protocol semantics are context sensitive

• Common relationship among basic fields.

State

Sequential Branch Loop Derive(a) (d)(c)(b)

StateState

26

Example for WINRPC• Nodes

• States: S1 .. Sn

• 0.61 instruction/byte for BIND PDU

1 rpc_vers3 padding1 ncontext8 merge2

4 packed_drep1 pfc_flags1 ptype1 rpc_ver_minor

2 ID

tran_syn4 UUID_ver

16 UUID1 padding1 n_tran_synS4

20S4

S2++S2£S3

S2 ‹- 0S3 ‹- ncontext

Header Bind

6 merge12 frag_length

S0S0

merge3S1-16

Bind

Bind-ACK

S1

Bind-ACK

27

High speed matching

• Problem formulation

• Observation

• Candidate Selection Algorithm

• Algorithm Refinement

28

Matching Problem Formulation

• Data presentation– For all the vulnerability signartures we studied we

need integers and strings– Integer operator: ==, >, <– String operator: ==, match_re(.,.), len(.),

• Buffer constraint – The string fields could be too long to buffer. – Influence whether we can change the matching order

• Field dependency– Array (e.g., DNS_questions, or RR records)– Associate array (e.g., HTTP headers)– Mutual exclusive fields.

29

Matching Problem Formulation (2)

• PDU level protocol state machine– For complex stateful protocols– For most stateful protocols the state machine

is quite simple

error

BINDrequest

BIND-ACKrequest

CALLrequest

CALL-ACKrequest

WINRPC example

30

Matching problems (cont.)

• Example signature for Blaster worm

• Single PDU matching problem (SPM)

• Multiple PDU matching problem (MPM)

31

Single PDU Matching

• Suppose we have n signatures, each is defined on k matching dimensions (matchers)– Matcher is a two tuple (field, operation) or four tuple

for the associate array elements.– For example:

• (Filename, RE)• (Version, Range_check)

– Version > 3

– Version == 1

• k is all possible matchers for the n signatures.

32

Table Representation

• We use a n×k table to represent the rules.

matcher j

Sig i *

n row

signatures

k matchers

33

Requirement for SPM

• Large number of signatures n• Large number of matchers k• Large number of “don’t cares”• Cannot reorder the matchers arbitrarily (buffer

constraint)• Field dependency

– Array– Associate Array– Mutually exclusive Fields.

34

Compare to packet classification

• Similarity: both problem define on k matching dimensions and allow wildcards

• Differences:– Large k and large number of “don’t cares”– Buffer constraint– Regular expression matcher– Field dependency

• Related work on packet classification– Exhaustive search– Decision tree– Tuple space– Divide and Conquer (Decomposition)

35

Difficulty

• A more complex problem than packet classification

• Packet classification theoretical worst case bound– Based on computational geometry – O ((logN)k-1) worst case time or O (Nk) worst

case memory

• Solution: use the characteristics from real traces

36

Observation

• Observation 1: most matchers are good. – After matching against them, only a small number of

signatures can pass (candidates). – String matchers are all good, most integer matchers

are good. – We can buffer the bad matchers to change the

matching order• Observation 2: real world traffic mostly does not

match any signature. Actually even stronger in most case no matcher will match any rule.

• Observation 3: the NIDS/IPS will report all the matched rules regardless the ordering. Differ from firewall rules.

37

Basic idea

• Decide the matcher order at pre-computation, buffer the bad ones to the end if possible

• When a PDU comes, match again each matcher (column) for all the signatures simultaneously and get the possible candidates for next step

• Combine the candidate sets together to get the final matched signatures

38

Match single matcher

• Integer range checking: Binary search tree

• String exact matching: Trie

• String regular expression matching: DFA.

• String length checking: Binary search tree

39

Candidate Selection for SPM• Basic algorithm: pre-computation

ER1

ER1 ER2

ER1 ER2 ER3 ...ER4

...

Good Matcher 1 Don’t care of Good Matcher 1

Extended byGood Matcher 2

Don’t care of both Good Matcher 1 & 2

Don’t care of all Good Matcher 1 to n

40

RB1: 1 2 3 RB2: 4 5 6

Matching Illustration

RB1: 1 2 3 RB2: 4 5 6 RB3: 7 RB4: 8

RB1: 1 2 3

RB1: 1 2 3 RB2: 4 5 6 RB3: 7

S2 = S1 A2+B2 = {3} {}+{6} = {}+{6} = {6}

S3 = S2 A3+B3 = {6} {}+{} = {6}+{} = {6}

S4 = S3 A4+B4 = {6} {4}+{} = {6}+{} = {6}

RB1: 1 2 3 RB2: 4 5 6 RB3: 7 RB4: 8 RB5: 9

S5 = S4 A5+B5 = {6} {6}+{} = {6}+{} = {6}

S1= {3}

PDU={Method=POST, Filename=fp40reg.dll, VARs: name="file"; value~".*\.\./.*", Headers: name="host"; len(value)=450}

A2 candidates B2 candidates

41

Matching Illustration

• Compute the operations– Explicit calculation

• Based on a n×k Bitmap decide the whether an element in Si requires next matchers.

• For those requires next matchers, search whether it is also in Ai+1

– Implicit calculation (for bad matchers)• Do not calculate Ai+1 , since it could be large

• Check whether the candidates in Si can match matcher (i+1) sequentially

• When buffer bad matchers to the end, the B will be small.

42

Refinement

• SPM improvement– Allow negative conditions– Handle array case– Handle associate array case– Handle mutual exclusive case– Report the matched rules as early as possible

• Extend to MPM– Allowing checkpoints.

43

Results• Traces from Tsinghua Univ. (TH) and Northwestern Univ.

(NU)• After TCP reassembly and preload the PDU in memory• For DNS we only evaluate parsing.• For WINRPC we have 45 vulnerability signatures which

covers 3,519 Snort rules• For HTTP we have 791 vulnerability signatures which

covers 941 Snort rules.

44

Discussion

• Currently we found the candidate selection algorithm works well in practice

• Further thoughts– How to rely more on hardware assistance?

• TCAM?• Use bitmap to express set operations?

– Whether we can consider the traffic statistics to further improve efficiency?

45

Outline

• Motivation


• Finished Work

• Proposed Work

• Research Plan

46

Publications

• Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorohic Worms, in the Proc. of IEEE ICNP 2007.

• Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reversible sketches: Enabling monitoring and analysis over high speed data streams, in the IEEE/ACM Transaction on Networking, Volume 15, Issue 5, Oct, 2007

• Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao, Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience, in Proc. of IEEE Symposium on Security and Privacy, 2006

• Zhichun Li, Yan Chen and Aaron Beach, Towards Scalable and Robust Distributed Intrusion Alert Fusion with Good Load Balacing, in Proc. of ACM SIGCOMM LSAD 2006

• Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level Intrusion Detection Approach for High-speed Networks, In Proc. Of IEEE ICDCS 2006

• Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluations, and Applications, in the Proc. Of IEEE INFOCOM 2006

47

Research Time Plan

• Apr 2008 – Jun 2008: – Finish remaining experiments of network situational

awareness

• Sep 2008 – Mar 2008:– Refine the vulnerability signature matching algorithm– Fully implement, deploy and evaluate the Netshield

prototype– Prepare job application and interview

• Apr 2009 – Jun 2009: – PhD dissertation writing– Thesis Defense

48

Q & A

Thanks!

49

Backup

51

Outline

• Motivation

• Feasibility Study: a measurement approach

• Problem Statement


• High Speed Matching for massive vulnerability Signatures.

• Evaluation

• Conclusions

52

Outline

• Motivation





• Evaluation

• Conclusions

53

Outline

• Motivation




• High Speed Matching for a large number of vulnerability Signatures.

• Evaluation

• Conclusions

54

Outline

• Motivation





• Evaluation

• Conclusions

55

Limitations of Regular Expression Signatures

1010101

10111101

11111100

00010111

Our network

Traffic Filtering

Internet

Signature: 10.*01

XX

Polymorphic attack (worm/botnet) might not have exact regular expression based signature

Polymorphism!

56

What we do?

• Build a NIDS/NIPS with much better accuracy and similar speed comparing with Regular Expression based approaches– Feasibility: Snort ruleset (6,735 signatures) 86.7%

can be improved by vulnerability signatures.– High speed Parsing: 2.7~12 Gbps– High speed Matching:

• Efficient Algorithm for matching massive vulnerability rules• HTTP, 791 vulnerability signatures at ~1Gbps

57

Problem Formulation

• Parsing problem formulation– Given a PDU and the protocol specification as

input, output the set of fields which required by matching.

58

Publications

• Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorohic Worms, in the Proc. of IEEE ICNP 2007.

• Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reversible sketches: Enabling monitoring and analysis over high speed data streams, in the IEEE/ACM Transaction on Networking, Volume 15, Issue 5, Oct, 2007

• Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao, Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience, in Proc. of IEEE Symposium on Security and Privacy, 2006

• Zhichun Li, Yan Chen and Aaron Beach, Towards Scalable and Robust Distributed Intrusion Alert Fusion with Good Load Balacing, in Proc. of ACM SIGCOMM LSAD 2006

• Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level Intrusion Detection Approach for High-speed Networks, In Proc. Of IEEE ICDCS 2006

• Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluations, and Applications, in the Proc. Of IEEE INFOCOM 2006

59

Current Status

• Part I: Sketch based monitoring & detection– Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yi

n Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reversible sketches: Enabling monitoring and analysis over high speed data streams, in the IEEE/ACM Transaction on Networking, Volume 15, Issue 5, Oct, 2007

– Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluations, and Applications, in the Proc. Of IEEE INFOCOM 2006 (252/1400=18%)

– Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level Intrusion Detection Approach for High-speed Networks, In Proc. Of IEEE International Conference on Distributed Computing Systems (ICDCS) 2006 (75/536=14%) (Alphabetical order)

• Part II: Polymorphic worm signature generation– TOSG: Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao,

Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience, in Proc. of IEEE Symposium on Security and Privacy, 2006 (23/251=9%)

– LESG: Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorohic Worms, in the Proc. of IEEE International Conference on Network Protocols (ICNP) 2007 (32/220=14%)

60

Current Status

• Part III: Signature matching engines– Work in progress, will be focus of this talk– Zhichun Li, Gao Xia, Yi Tang, Jian Chen, Ying He, Yan Chen

and Bin Liu, NetShield : Towards High Performance Network-based Semantic Signature Matching, in submission

• Part IV: Network Situational Awareness– Work in process– Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson, Towards

Situational Awareness of Large-Scale Botnet Events using Honeynets, in preparation

– Zhichun Li, Anup Goyal, Yan Chen and Aleksandar Kuzmanovic, P2P Doctor: Measurement and Diagnosis of Misconfigured Peer-to-Peer Traffic, in submission

raidm: router-based anomaly/intrusion detection and mitigation zhichun li eecs deparment...

Documents

highspeed network monitoring

process slide

system framework slide

monitoring detection

motivation network security

highspeed networks

thesis proposal slide

yan chen