Space-Time Tradeoffs in Software-Based Deep Packet Inspection
Anat Bremler-Barr Yotam Harchol⋆ David HayIDC Herzliya, Israel Hebrew University, Israel .
OWASP Israel 2011
Parts of this work were supported by European Research Council (ERC) Starting Grant no. 259085 ⋆Supported by the Check Point Institute for Information Security
(Was also presented in IEEE HPSR 2011)
2
Outline
Motivation
Background
New Compression Techniques
Experimental Results
Conclusions
3Motivation Background New Compression Techniques Experimental Results Conclusions
Network Intrusion Detection Systems
• Classify packets according to:– Header fields:
Source IP & port, destination IP & port, protocol, etc.– Packet payload (data)
InternetIP
packet
Deep Packet InspectionMotivation
Motivation Background New Compression Techniques Experimental Results Conclusions
Deep Packet Inspection
(D)RAM
CacheMemory
High CapacitySlow Memory
Locality-basedLow CapacityFast Memory
The environment:
Motivation 4Motivation
5Motivation Background New Compression Techniques Experimental Results Conclusions
Our Contributions
Literature assumption: try to fit data structure in cache Efforts to compress the data structures
Our paper: Is it beneficial?
• In reality, even in non-compressed implementation, most memory accesses are done to the cache
BUT• One can attack the non-compressed implementation by reducing its
locality, getting it out of cache - and making it much slower!
How to mitigate this attack?• Compress even further - our new techniques: 60% less memory
Motivation
6Motivation Background New Compression Techniques Experimental Results Conclusions
Complexity DoS Attack
• Find a gap between average case and worst case• Engineer input that exploits this gap• Launch a Denial of Service attack on the system
Internet
Real-Life Traffic
Throughput
Motivation
7
Outline
Motivation
Background
New Compression Techniques
Experimental Results
Conclusions
8Motivation Background New Compression Techniques Experimental Results Conclusions
Aho-Corasick Algorithm
• Build a Deterministic Finite Automaton
• Traverse the DFA, byte by byte
• Accepting state pattern found
• Example:{E, BE, BD, BCD, CDBCAB, BCAA}
[Aho, Corasick; 1975]
s0
s7
s12
s1 s2
s3 s5s4
s14
s13 s6
s8
s9
s10
s11
C
C
E
D
B
E D
D B
C
A
B
A
A
B
E
CB
E
CBE
C
DE
BC
D
E C
E
B CE
B
C
E
B C
E
C
B B
Background
B
BCDBCABInput:
s0
s12
s2
s5
s6s9
s10
s11
9Motivation Background New Compression Techniques Experimental Results Conclusions
Aho-Corasick Algorithm
• Naïve implementation:Represent the transition functionin a table of |Σ|×|S| entries– Σ: alphabet– S: set of states
• Lookup time: one memory accessper input symbol
• Space:In reality: 70MB to gigabytes…
[Aho, Corasick; 1975]
Background
A B C D E
S0 0 2 7 0 1S1 0 2 7 0 1S2 0 2 5 4 3S3 0 2 7 0 1S4 0 2 7 0 1S5 13 2 7 6 1S6 0 9 7 0 1S7 0 2 7 8 1S8 0 9 7 0 1:
10Motivation Background New Compression Techniques Experimental Results Conclusions
Potential Complexity DoS Attack
1. Exhaustive Traversal Adversarial Traffic– Traverses as much states of the automaton– Bad locality - Bad for naïve implementation
(will not utilize cache)
s0
s7
s12
s1 s2
s3 s5s4
C
C
E
D
B
E D
s14
s13 s6
D
s8
B
s9
C
s10
A
s11
B
A
A
Background
11Motivation Background New Compression Techniques Experimental Results Conclusions
Alternative Implementation
• Failure transition goes to the state that matches the longest suffix of the input so far
• Lookup time: at most two memory accesses per input symbol (via amortized analysis)
• Space: at most, # of symbols in pattern set, depends on implementation
[Aho, Corasick; 1975]
B
E
CB
E
CBE
C
DE
BC
D
E C
E
B CE
B
C
E
B C
E
BC
B B
s0
s7
s12
s1 s2
s3 s5s4
s14
s13 s6
s8
s9
s10
s11
C
C
E
D
B
E D
D B
C
A
B
A
A
Forward TransitionFailure Transition
Background
s10
s5
s7
s0
s1
12Motivation Background New Compression Techniques Experimental Results Conclusions
Potential Complexity DoS Attack
1. Exhaustive Traversal Adversarial Traffic- Traverses as much states of the automaton- Bad locality - Bad for naïve implementation
(will not utilize cache)
2. Failure-path Traversal Adversarial Traffic- Traverses as much failure transitions- Bad for failure-path based automaton
(as much memory accesses per input symbol)
s0
s7
s12
s1 s2
s3 s5s4
C
C
E
D
B
E D
s14
s13 s6
D
s8
B
s9
C
s10
A
s11
B
A
A
Background
13Motivation Background New Compression Techniques Experimental Results Conclusions
s0
s7 s1 s2
s3 s5s4
C
C
E
D
B
E D
s14
s13 s6
D
s8
B
s9
C
s10
A
A
s0
s7 s1 s2
s3 s5s4
C
C
E
D
B
E D
s14
s13 s6
D
s8
B
s9
C
s10
A
A
Prior Work: Compress the State Representation
symbol A B C D E
forward: 13 6
Lookup Table
7failure: Falsematch:
A B C D E
1 0 0 1 0
Bitmap Encoded
Bitmap:Length=|Σ|forward: 13 6
7failure: Falsematch:
symbol A D
forward: 13 6
Linear Encoded
7failure: Falsematch:2size:
Background Experimental Results Conclusions
Can count bits usingpopcnt instruction
14
Outline
Motivation
Background
New Compression Techniques
Experimental Results
Conclusions
15Motivation Background New Compression Techniques Experimental Results Conclusions
Path Compression
• One-way branches can berepresented using a single state– Similarly to PATRICIA tries
• Problem: Incoming failure transitions
• Solution: Compress only states withno incoming failure transitions
New Compression Techniques
s0
s7
s12
s1 s2
s3 s5s4
s14
s13 s6
s8
s9
s10
s11
C
C
E
D
B
E D
D B
C
A
B
A
A
s0
s7
s12
s1 s2
s3 s5s4
s14
s13 s6
s8
s9
s10
s11
C
C
E
D
B
E D
D B
C
A
B
A
A
s0
s7 s1 s2
s3 s5s4
s14
s13 s6
s8
s9 '
C
C
E
D
B
E D
D BCABA
A
(B)
(BC)
(BCA)
(BCAB)
Tuck
et a
l.
Our
Pat
hCo
mpr
essio
n
100%
75%
2004
16Motivation Background New Compression Techniques Experimental Results Conclusions
Leaves Compression
• By definition, leaves have noforward transitions
• Their single purpose is to indicatea match– We can push this indication up by
adding a bit to each pointer– Then, leaves can be eliminated from the
automaton - by copying their failuretransition up
s0
s7 s1 s2
s3 s5s4
C
C
E
D
B
E D
s14
s13 s6
D
s8'
BCAB
s9'
A
A
(B)
(BC)
(BCA)
s0
s7 s2
s5
C
C
E*
D
BD*
s13
D*BCAB*
A
A*
(B)
(BC)
(BCA)
E*
s8'
3% more space reductionReduces number of transitions taken
s0
s7 s1 s2
s3 s5s4
C
C
E*
D
B
E* D*
s14
s13 s6
D*
s8'
BCAB*
s9'
A
A*
(B)
(BC)
(BCA)
New Compression Techniques
17Motivation Background New Compression Techniques Experimental Results Conclusions
Pointer Compression
In Snort IDS pattern-set, 79% of the fail pointers point to states in depths 0, 1, 2
Add two bits to encode depth of pointer:00: Depth 001: Depth 110: Depth 211: Depth 3 and deeper
Depth Pointers
0 (s0) 13%
1 31%
2 35%
≥ 3 21%
New Compression Techniques
Depth ≤ 216 bits pointer 2 bits
11Depth > 216 bits pointer 2 bits 16 bits pointer
18Motivation Background New Compression Techniques Experimental Results Conclusions
Pointer Compression
Depth Pointers
0 (s0) 13%
1 31%
2 35%
≥ 3 21%
New Compression TechniquesTu
ck e
t al
.
Our
Pat
hCo
mpr
essio
n
100%
75%
Poin
ter
Com
p.
41%
2004
Determine next state from pointer depth:- 0: Go to root- 1: Use a lookup table using last symbol- 2: Use a hash table using last two symbols- ≥ 3: Use the stored pointer
Symbol StateA -B s2
C s7
D -E s1
Depth 1 Lookup Table: Depth 2 Hash Table:
hashtable
Last 2 symbols
Next state
19Motivation Background New Compression Techniques Experimental Results Conclusions
Function Inlining
• Compressed implementation makes more memory accesses• Initial implementation was based on a few functions calling
each other
• Avoiding function calls (by inlining their code) reduced total number of memory reads by 36%
New Compression Techniques
20
Outline
Motivation
Background
New Compression Techniques
Experimental Results
Conclusions
21Motivation Background New Compression Techniques Experimental Results Conclusions
Experimental Setup
System 1 System 2Type MacBook Pro iMac
CPU Core 2 Duo 2.53GHz dual core Core i7 2.93GHz quad core
L1 Cache: 16KB (data, per core) 16KB (data, per core)
L2 Cache: 3MB (shared) 256KB (per core)
L3 Cache: - 8MB (shared)
Snort ClamAV*
Patterns 31,094 16,710
States in Naïve Implementation 77,182 745,303
Test Systems
Pattern-Sets
Experimental Results
Real-life traffic logs taken from MIT DARPA *We used only half of ClamAV signatures for our tests
22Motivation Background New Compression Techniques Experimental Results Conclusions
0
10
20
30
40
50
60
70
80Snort(Partial) ClamAV
Space Requirement
Experimental Results
722.14
Mem
ory
Foot
prin
t [M
B]
1.5 2.59
23Motivation Background New Compression Techniques Experimental Results ConclusionsExperimental Results
Memory Accesses per Input Symbol
Naïve Implementation Our Implementation0
5
10
15
20
25
30
35
40
45
50Real Life Traffic
Exhaustive Traversal Adversarial Traffic
Failure-Path Traversal Adversarial Traffic
Mem
ory
acce
sses
per
inpu
t sym
bol
24Motivation Background New Compression Techniques Experimental Results Conclusions
Naïve Implementation Our Implementation0%
5%
10%
15%
20%
25%
30%
35%Real Life Traffic
Exhaustive Traversal Adversarial Traffic
Failure-Path Traversal Adversarial Traffic
Experimental Results
L1 Data Cache Miss RateIntel Core 2 Duo (2 cores)16KB L1 Data Cache3MB L2 Cache
L1 D
ata
Cach
e M
iss R
ate
25Motivation Background New Compression Techniques Experimental Results Conclusions
Naïve Implementation Our Implementation0%
5%
10%
15%
20%
25%Real Life Traffic
Exhaustive Traversal Adversarial Traffic
Failure-Path Traversal Adversarial Traffic
Experimental Results
L2 Cache Miss RateIntel Core 2 Duo (2 cores)16KB L1 Data Cache3MB L2 Cache
Real-Life Traffic:0.7% L2 Cache
Miss Rate
Adversarial Traffic:23% L2 Cache
Miss Rate
Maximal L2 Miss Rate:0.06%L2
Cac
he M
iss R
ate
26Motivation Background New Compression Techniques Experimental Results Conclusions
Experimental Results
Space vs. Time:
1 10 1000
200
400
600
800
1000
1200
1400Real-Life Traffic ThroughputAdversarial Traffic Throughput
Memory Footprint [MB] (Logarithmic Scale)
Thro
ughp
ut [M
Bps]
-86%Our
Implementation
Naïve Implementation
Experimental Results
27
Outline
Motivation
Background
New Compression Techniques
Experimental Results
Conclusions
28Motivation Background New Compression Techniques Experimental Results Conclusions
Conclusions
Naïve Aho-Corasick
implementation
It is crucial to model the cache in software-based Deep Packet Inspection:
• Naïve Aho-Corasick implementationhas a huge memory footprint, but works well on real-life traffic due to locality of reference
• Naïve implementation can be easily attacked,making it 7 times slower, even though it has constant number of memory accesses
We also show new compression techniques:
• 60% less memory than best prior-art compression
• Stable throughput, better performance under attacks
Conclusions
Questions?
Thank you!