deterministic memory- efficient string matching algorithms for intrusion detection nathan tuck,...
Post on 20-Dec-2015
219 views
TRANSCRIPT
Deterministic Memory-Deterministic Memory-Efficient String Efficient String
Matching Algorithms for Matching Algorithms for Intrusion DetectionIntrusion Detection
Deterministic Memory-Deterministic Memory-Efficient String Efficient String
Matching Algorithms for Matching Algorithms for Intrusion DetectionIntrusion Detection
Nathan Tuck, Timothy Sherwood, Brad Calder, George Nathan Tuck, Timothy Sherwood, Brad Calder, George VargheseVarghese
Department of Computer Science and Engineering, Department of Computer Science and Engineering, University of California, San DiegoUniversity of California, San Diego
Department of Computer Science, University of California, Department of Computer Science, University of California, Santa BarbaraSanta Barbara
Abstract• IDSs : Intrusion Detection Systems• Space and time efficient string match
ing algorithms• Providing worst-case performance
– Amenable to H/W implementation• Aho-Corasick
– Memory, performance
Introduction (i)• Combating attacks at every level• Automatically monitoring network traffic• IDS uses a set of rules
– Apply to matching packets
• Edge and core routers– Stringent worst-case performance bounds– Tight constraints on memory
Introduction (ii)• At the heart of IDSs is a string matching alg
orithm– In Snort, 70% of total execution time and 80% o
f instructions executed• Contributions of this paper
– Characterization– New Algorithms– Evaluation
String matching for intrusion detection
Quantifying the Use of String Matching (i)
• Snort-An intrusion detection system– The rules are generated manually
• Extract relevant content strings from the payload and header of known attacks
– The action can include logging, alerting, ignoring, ……
– Rules are usually added as new vulnerabilities are discovered
Quantifying the Use of String Matching (ii)
• Scalability of the intrusion detection system database– Beneficial to avoid that has run-time
proportional to the length of the rules in the database
– New rules are being added to detect or combat new attacks
Quantifying the Use of String Matching (iii)
• Linearly searching through the of rules is becoming increasingly infeasible
• The database is growing at a rate that is well within Moore’s Law
• Need a technique with run-time performance
State of the Art in String Matching (i)
• Single-pattern string matching– Boyer-Moore, ……
• Multi-pattern string matching– Aho-Corasick, Wu-Manber, ……
• Imprecise string matching– Using hashing and signature-based– Be reverified using a precise string matching
State of the Art in String Matching (ii)
• Bad Character Heuristics– Easily exploitable by attackers
• Aho-Corasick– Use unoptimized data structure for space opti
mizations• SFKSearch
– Worst-case performance is quite poor• Wu-Manber
– Memory access to the shift and hash table
Applying IP Lookup Techniques to String
Matching (i)• IP-lookup : a set of patterns to matc
h, finding the longest possible match for a set of IP address that are streaming by
• String matching : a set of strings to match, finding all of the places in the input stream where there is a match
Applying IP Lookup Techniques to String
Matching (ii)• Unibit and Multibit Tries
– Wastes space with pointer• Lulea Algorithm
– Use the concepts of leaf pushing and bitmaps to compress the database
• Eatherton Algorithm– Internal bitmap and external bitmap
Optimizations for string matching
Bitmap compression (i)• With 32-bits pointers• In Aho-Corasick has 256 next state pointers• Now using a single pointer to the first valid
next state, and maintain a 256 bit bitmap• Summing all the bits prior that bit number
and adding them to the base next node pointer
Bitmap compression (ii)• Original optimized Aho-Corasick
– 1028 bytes each node• Bitmapped version
– Only 44 bytes each node• Incurs two costs
– Doubles the worst-case of work– Performing a sum up to 256 prior bits
Path Compression (i)• Bitmap is largely wasted
information at the bottom nodes• Any path compressed nodes must
be equal in size to bitmapped nodes
• Failure pointers must include an offset
Path Compression (ii)• On a 32 bits pointer
– A single path compressed node can contain data equivalent to 4 bitmap compressed nodes
– In practice, achieve a 2.54:1 compression ratio
Results
Instruction Detection in Hardware
• The number of rules go up by over a factor of 2.5, whereas the size of memory for our algorithm only goes up by 30%
• Focus our attention on the worst-case performance
Intrusion Detection in Software
• Examine both average-case and worst-case performance
• Wu-Manber is the fastest in the average-case because of hash function
Summary (i)• Current software IDSs largely rely on
common-case optimizations to gain speed
• Aho-Corasick is only has deterministic worst-case lookup times and friendly enough to use for wire speed H/W matching
Summary (ii)• Contribution of this paper
– Apply bitmap node compression and path compression to Aho-Corasick
– Gain both compact storage and worst-case performance