deterministic memory- efficient string matching algorithms for intrusion detection nathan tuck,...

31
Deterministic Deterministic Memory-Efficient Memory-Efficient String Matching String Matching Algorithms for Algorithms for Intrusion Detection Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Varghese Department of Computer Science and Engineering, Department of Computer Science and Engineering, University of California, San Diego University of California, San Diego Department of Computer Science, University of Department of Computer Science, University of California, Santa Barbara California, Santa Barbara

Post on 20-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Deterministic Memory-Deterministic Memory-Efficient String Efficient String

Matching Algorithms for Matching Algorithms for Intrusion DetectionIntrusion Detection

Deterministic Memory-Deterministic Memory-Efficient String Efficient String

Matching Algorithms for Matching Algorithms for Intrusion DetectionIntrusion Detection

Nathan Tuck, Timothy Sherwood, Brad Calder, George Nathan Tuck, Timothy Sherwood, Brad Calder, George VargheseVarghese

Department of Computer Science and Engineering, Department of Computer Science and Engineering, University of California, San DiegoUniversity of California, San Diego

Department of Computer Science, University of California, Department of Computer Science, University of California, Santa BarbaraSanta Barbara

Page 2: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Abstract• IDSs : Intrusion Detection Systems• Space and time efficient string match

ing algorithms• Providing worst-case performance

– Amenable to H/W implementation• Aho-Corasick

– Memory, performance

Page 3: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Introduction (i)• Combating attacks at every level• Automatically monitoring network traffic• IDS uses a set of rules

– Apply to matching packets

• Edge and core routers– Stringent worst-case performance bounds– Tight constraints on memory

Page 4: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Introduction (ii)• At the heart of IDSs is a string matching alg

orithm– In Snort, 70% of total execution time and 80% o

f instructions executed• Contributions of this paper

– Characterization– New Algorithms– Evaluation

Page 5: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

String matching for intrusion detection

Page 6: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Quantifying the Use of String Matching (i)

• Snort-An intrusion detection system– The rules are generated manually

• Extract relevant content strings from the payload and header of known attacks

– The action can include logging, alerting, ignoring, ……

– Rules are usually added as new vulnerabilities are discovered

Page 7: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Quantifying the Use of String Matching (ii)

• Scalability of the intrusion detection system database– Beneficial to avoid that has run-time

proportional to the length of the rules in the database

– New rules are being added to detect or combat new attacks

Page 8: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department
Page 9: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department
Page 10: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Quantifying the Use of String Matching (iii)

• Linearly searching through the of rules is becoming increasingly infeasible

• The database is growing at a rate that is well within Moore’s Law

• Need a technique with run-time performance

Page 11: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

State of the Art in String Matching (i)

• Single-pattern string matching– Boyer-Moore, ……

• Multi-pattern string matching– Aho-Corasick, Wu-Manber, ……

• Imprecise string matching– Using hashing and signature-based– Be reverified using a precise string matching

Page 12: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

State of the Art in String Matching (ii)

• Bad Character Heuristics– Easily exploitable by attackers

• Aho-Corasick– Use unoptimized data structure for space opti

mizations• SFKSearch

– Worst-case performance is quite poor• Wu-Manber

– Memory access to the shift and hash table

Page 13: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department
Page 14: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Applying IP Lookup Techniques to String

Matching (i)• IP-lookup : a set of patterns to matc

h, finding the longest possible match for a set of IP address that are streaming by

• String matching : a set of strings to match, finding all of the places in the input stream where there is a match

Page 15: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Applying IP Lookup Techniques to String

Matching (ii)• Unibit and Multibit Tries

– Wastes space with pointer• Lulea Algorithm

– Use the concepts of leaf pushing and bitmaps to compress the database

• Eatherton Algorithm– Internal bitmap and external bitmap

Page 16: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Optimizations for string matching

Page 17: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Bitmap compression (i)• With 32-bits pointers• In Aho-Corasick has 256 next state pointers• Now using a single pointer to the first valid

next state, and maintain a 256 bit bitmap• Summing all the bits prior that bit number

and adding them to the base next node pointer

Page 18: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department
Page 19: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Bitmap compression (ii)• Original optimized Aho-Corasick

– 1028 bytes each node• Bitmapped version

– Only 44 bytes each node• Incurs two costs

– Doubles the worst-case of work– Performing a sum up to 256 prior bits

Page 20: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Path Compression (i)• Bitmap is largely wasted

information at the bottom nodes• Any path compressed nodes must

be equal in size to bitmapped nodes

• Failure pointers must include an offset

Page 21: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department
Page 22: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Path Compression (ii)• On a 32 bits pointer

– A single path compressed node can contain data equivalent to 4 bitmap compressed nodes

– In practice, achieve a 2.54:1 compression ratio

Page 23: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department
Page 24: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Results

Page 25: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Instruction Detection in Hardware

• The number of rules go up by over a factor of 2.5, whereas the size of memory for our algorithm only goes up by 30%

• Focus our attention on the worst-case performance

Page 26: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department
Page 27: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Intrusion Detection in Software

• Examine both average-case and worst-case performance

• Wu-Manber is the fastest in the average-case because of hash function

Page 28: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department
Page 29: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department
Page 30: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Summary (i)• Current software IDSs largely rely on

common-case optimizations to gain speed

• Aho-Corasick is only has deterministic worst-case lookup times and friendly enough to use for wire speed H/W matching

Page 31: Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department

Summary (ii)• Contribution of this paper

– Apply bitmap node compression and path compression to Aho-Corasick

– Gain both compact storage and worst-case performance