deterministic memory- efficient string matching algorithms for intrusion detection nathan tuck,...

Deterministic Memory-Deterministic Memory-Efficient String Efficient String

Matching Algorithms for Matching Algorithms for Intrusion DetectionIntrusion Detection

Deterministic Memory-Deterministic Memory-Efficient String Efficient String

Matching Algorithms for Matching Algorithms for Intrusion DetectionIntrusion Detection

Nathan Tuck, Timothy Sherwood, Brad Calder, George Nathan Tuck, Timothy Sherwood, Brad Calder, George VargheseVarghese

Department of Computer Science and Engineering, Department of Computer Science and Engineering, University of California, San DiegoUniversity of California, San Diego

Department of Computer Science, University of California, Department of Computer Science, University of California, Santa BarbaraSanta Barbara

Abstract• IDSs ： Intrusion Detection Systems• Space and time efficient string match

ing algorithms• Providing worst-case performance

– Amenable to H/W implementation• Aho-Corasick

– Memory, performance

Introduction (i)• Combating attacks at every level• Automatically monitoring network traffic• IDS uses a set of rules

– Apply to matching packets

• Edge and core routers– Stringent worst-case performance bounds– Tight constraints on memory

Introduction (ii)• At the heart of IDSs is a string matching alg

orithm– In Snort, 70% of total execution time and 80% o

f instructions executed• Contributions of this paper

– Characterization– New Algorithms– Evaluation

String matching for intrusion detection

Quantifying the Use of String Matching (i)

• Snort-An intrusion detection system– The rules are generated manually

• Extract relevant content strings from the payload and header of known attacks

– The action can include logging, alerting, ignoring, ……

– Rules are usually added as new vulnerabilities are discovered

Quantifying the Use of String Matching (ii)

• Scalability of the intrusion detection system database– Beneficial to avoid that has run-time

proportional to the length of the rules in the database

– New rules are being added to detect or combat new attacks

Quantifying the Use of String Matching (iii)

• Linearly searching through the of rules is becoming increasingly infeasible

• The database is growing at a rate that is well within Moore’s Law

• Need a technique with run-time performance

State of the Art in String Matching (i)

• Single-pattern string matching– Boyer-Moore, ……

• Multi-pattern string matching– Aho-Corasick, Wu-Manber, ……

• Imprecise string matching– Using hashing and signature-based– Be reverified using a precise string matching

State of the Art in String Matching (ii)

• Bad Character Heuristics– Easily exploitable by attackers

• Aho-Corasick– Use unoptimized data structure for space opti

mizations• SFKSearch

– Worst-case performance is quite poor• Wu-Manber

– Memory access to the shift and hash table

Applying IP Lookup Techniques to String

Matching (i)• IP-lookup ： a set of patterns to matc

h, finding the longest possible match for a set of IP address that are streaming by

• String matching ： a set of strings to match, finding all of the places in the input stream where there is a match

Applying IP Lookup Techniques to String

Matching (ii)• Unibit and Multibit Tries

– Wastes space with pointer• Lulea Algorithm

– Use the concepts of leaf pushing and bitmaps to compress the database

• Eatherton Algorithm– Internal bitmap and external bitmap

Optimizations for string matching

Bitmap compression (i)• With 32-bits pointers• In Aho-Corasick has 256 next state pointers• Now using a single pointer to the first valid

next state, and maintain a 256 bit bitmap• Summing all the bits prior that bit number

and adding them to the base next node pointer

Bitmap compression (ii)• Original optimized Aho-Corasick

– 1028 bytes each node• Bitmapped version

– Only 44 bytes each node• Incurs two costs

– Doubles the worst-case of work– Performing a sum up to 256 prior bits

Path Compression (i)• Bitmap is largely wasted

information at the bottom nodes• Any path compressed nodes must

be equal in size to bitmapped nodes

• Failure pointers must include an offset

Path Compression (ii)• On a 32 bits pointer

– A single path compressed node can contain data equivalent to 4 bitmap compressed nodes

– In practice, achieve a 2.54:1 compression ratio

Results

Instruction Detection in Hardware

• The number of rules go up by over a factor of 2.5, whereas the size of memory for our algorithm only goes up by 30%

• Focus our attention on the worst-case performance

Intrusion Detection in Software

• Examine both average-case and worst-case performance

• Wu-Manber is the fastest in the average-case because of hash function

Summary (i)• Current software IDSs largely rely on

common-case optimizations to gain speed

• Aho-Corasick is only has deterministic worst-case lookup times and friendly enough to use for wire speed H/W matching

Summary (ii)• Contribution of this paper

– Apply bitmap node compression and path compression to Aho-Corasick

– Gain both compact storage and worst-case performance

deterministic memory- efficient string matching algorithms for intrusion detection nathan tuck,...

Documents

precise string matching

imprecise string matching

memory slide

match slide

string matching i iplookup

use of string matching

intrusion detection

runtime performance