fast and scalable pattern matching for content filtering sarang dharmapurikar john lockwood
TRANSCRIPT
Fast and Scalable Pattern Matching for Content Filtering
Sarang DharmapurikarJohn Lockwood
Sarang Dharmapurikar
Motivation
● Deep packet inspection Detection of Internet worms, computer viruses,
SPAM, copyrighted material, Intrusion Detection/Prevention Layer-7 switching Content classification
● Needs fast string matching mechanism
● Some desirable features of the mechanism String matching at line speed Ability to detect strings at random locations in the payload Ability to detect 1000s of strings Ability to handle arbitrarily long strings
Sarang Dharmapurikar
Aho-Corasick Algorithm
● Two Problems At least 1 memory access per
character (at the most 2)o Slows it down
Only one character at a timeo bottleneck
s3 : tel
s5 : phones6 : elephant
s4 : telephone
s1 : technicals2 : technically
l
e
p
h
a
n
q24
q25
q26
q27
q28
q29
q30
tq31
e
l
e
p
h
o
n
e
q12
q13
q14
q15
q16
q17
q18
q0
q1
t
e
c
h
n
i
q2
q3
q4
q5
q6
c
a
l
q7
q8
q9
q11y
q10
l
p
h
o
n
e
q19
q20
q21
q22
q23
Sarang Dharmapurikar
Why not use multiple engines?
Engine1
Engine2
Engine3
Engine4
Incoming connections
Each engine needs plenty of memory….
On-chip memory not practical
We need a memory chip
Multiple memory chipsMore pins, more power, more cost
Sarang Dharmapurikar
Can we…
● Process Multiple characters at a time● Without using multiple memory chips
?● What if we have a small amount of on-chip
memory?
Sarang Dharmapurikar
Our Approach
● Modify Aho-Corasick to jump ahead by k characters Jump Ahead Aho-CorasicK (JACK)-FA
● Represent JACK-FA as a hash table. Keep only one copy in the off-chip memory
● Keep k copies of the compressed & approximate JACK-FA hash table in on-chip memory Use Bloom filters for approximate
representation Consumes very little memory
Off-chipJACK-FA
Data stream
On-chip approximate JACK-FAs
Sarang Dharmapurikar
JACK-FA
s3 : tel
s5 : phon e
s6 : elep hant
s4 : tele phon e
s1 : tech nica l
s2 : tech nica lly
s3 : tel
s5 : phone
s6 : elephant
s4 : telephone
s1 : technical
s2 : technically
q0
q1
q5
tech
nica
s3,q2
q6
tele
phon
q3
phon
hant
q4
S6 q7
elep
s3
tel
S4,s5
e
s5
e
s1
l lly
S1,s2
Sarang Dharmapurikar
String matching with JACK-FA
t e c h nx y z i c a l l y a b c
hant
q0
q3 q4q1 q2
q5 q6 S6 q7
tech
nica
tele
phon
phon
elep
s3
s1 S4,s5
s5
tel
l lly e
e
S1,s2
w
Sarang Dharmapurikar
String matching with JACK-FA
t e c h nx y z i c a l l y a b c
hant
q0
q3 q4q1 q2
q5 q6 S6 q7
tech
nica
tele
phon
phon
elep
s3
s1 S4,s5
s5
tel
l lly e
e
S1,s2
w
Sarang Dharmapurikar
String matching with JACK-FA
t e c h nx y z i c a l l y a b c
hant
q0
q3 q4q1 q2
q5 q6 S6 q7
tech
nica
tele
phon
phon
elep
s3
s1 S4,s5
s5
tel
l lly e
e
S1,s2
w
Sarang Dharmapurikar
String matching with JACK-FA
t e c h nx y z i c a l l y a b c
hant
q0
q3 q4q1 q2
q5 q6 S6 q7
tech
nica
tele
phon
phon
elep
s3
s1 S4,s5
s5
tel
llly e
e
S1,s2
w
Sarang Dharmapurikar
String matching with JACK-FA
t e c h nx y z i c a l l y a b c
hant
q0
q3 q4q1 q2
q5 q6 S6 q7
tech
nica
tele
phon
phon
elep
s3
s1 S4,s5
s5
tel
llly e
e
S1,s2
w
Sarang Dharmapurikar
String matching with JACK-FA
t e c h nx y z i c a l l y a b c
hant
q0
q3 q4q1 q2
q5 q6 S6 q7
tech
nica
tele
phon
phon
elep
s3
s1 S4,s5
s5
tel
llly e
e
S1,s2
w
Sarang Dharmapurikar
Why we need k JACK-FA
t e c h nx y z i c a l l y a b c
hant
q0
q3 q4q1 q2
q5 q6 S6 q7
tech
nica
tele
phon
phon
elep
s3
s1 S4,s5
s5
tel
llly e
e
S1,s2
Sarang Dharmapurikar
Speed up
t e c h nx y z i c a l l y a b
Sarang Dharmapurikar
Speed up
t e c h nx y z i c a l l y a b
A single machine inoff-chip memory
k approximte and compressed machinesin on-chip memory
Use Bloom filters
Sarang Dharmapurikar
Tabular Representation
hant
q0
q3 q4q1 q2
q5 q6 S6 q7
tech
nica
tele
phon
phon
elep
s3
s1 S4,s5
s5
tel
l lly e
e
S1,s2[state, substr] Next State Matching str Failure Chain
[q0, tech] q1 - q0
[q0, tele] q2 S3 q0
[q0, phon] q3 - q0
[q0, elep] q4 - q0
[q1, nica] q5 - q0
[q2, phon] q6 - q3,q0
[q4, hant] q7 S6 q0
[q0, tel] - S3 -
[q3, e] - S5 -
[q5, lly] - S1, S2 -
[q5, l] - S1 -[q6, e] - S4 , S5 -
Sarang Dharmapurikar
Implementation with Bloom Filters
[state, substr] Next State Matching str Failure Chain[q0, tech] q1 - q0
[q0, tele] q2S3 q0
[q0, phon] q3 - q0
[q0, elep] q4 - q0
[q1, nica] q5 - q0
[q2, phon] q3 - q3,q0
[q4, hant] q7S6 q0
[q0, tel] - S3 -
[q3, e] - S5 -[q5, lly] - S1, S2 -
[q5, l] - S1 -[q6, e] - S4 , S5 -
B4B3B1 B2
q
Sarang Dharmapurikar
Implementation with Bloom Filters
[state, substr] Next State Matching str Failure Chain[q0, tech] q1 - q0[q0, tele] q2
S3 q0[q0, phon] q3 - q0[q0, elep] q4 - q0[q1, nica] q5 - q0[q2, phon] q3 - q3,q0[q4, hant] q7
S6 q0[q0, tel] - S3 -
[q3, e] - S5 -[q5, lly] - S1, S2 -
[q5, l] - S1 -[q6, e] - S4 , S5 -
B4B3B1 B2
q1
B4B3B1 B2
q2
B4B3B1 B2
q3
B4B3B1 B2
q4
Sarang Dharmapurikar
Throughput with Snort strings
● Off-chip memory: 250 MHz QDR-SRAM, 64-bit wide● String concentration: 1 in 100 characters● 2250 strings● 2 to 122 character strings
Sarang Dharmapurikar
Conclusions
● Fast string matching is an important module for Content filtering applications
● Off-chip memory accesses slow down string matching
● A large fraction of memory accesses can be avoided Using a small on-chip memory and Bloom filters
● Our accelerated Aho-Corasick algorithm can process 2250 strings with less than 50KB on-chip memory At a speed of more than 10Gbps
Thanks!
Questions ?
Sarang Dharmapurikar
Motivation
● The multi-pattern matching algorithm works for short strings (16 bytes) Hash computation over long strings becomes problematic Some virus signatures can be several hundred bytes long Snort’s longest string is 122 bytes
0
20
40
60
80
100
120
140
160
180
0 20 40 60 80 100 120 140
# s
trin
gs
string length in bytes
Sarang Dharmapurikar
Sarang Dharmapurikar
Accelerated Aho-Corasick Algorithm
● How to support arbitrarily large strings? At the cost of more memory? Break a long string into multiple smaller pieces Stitch them in a state machine Match individual segment and track the state machine
q0 q1 q2 q3
tech nica lly
SymbolsTail
Sarang Dharmapurikar
Speed up
t e c h nx y z i c a l l y a b
s1 s2 s3 s4
Sarang Dharmapurikar
Multiple machines
t e c h nx y z i c a l l y a b
s1 s2 s3 s4
Sarang Dharmapurikar
Multiple machines
t e c h nx y z i c a l l y a b
s1 s2 s3 s4
Sarang Dharmapurikar
Multiple machines
t e c h nx y z i c a l l y a b
s1 s2 s3 s4
Sarang Dharmapurikar
Multiple machines
t e c h nx y z i c a l l y a b
s1 s2 s3 s4
Sarang Dharmapurikar
Multiple machines
t e c h nx y z i c a l l y a b
s1 s2 s3 s4
Sarang Dharmapurikar
Aho-Corasick Algorithm
● Two Problems At least 1 memory access per
character (at the most 2)o Slows it down
Only one character at a timeo bottleneck
s3 : tel
s5 : phones6 : elephant
s4 : telephone
s1 : technicals2 : technically
q0
l
e
p
h
a
n
q24
q25
q26
q27
q28
q29
q30
tq31
q1
pe
t
e
lc
h
n
i
e
p
h
o
n
e
q2
q3
q4
q5
q6
q12
q13
q14
q15
q16
q17
q18
c
a
l
q7
q8
q9
q11y
q10
l
h
o
n
e
q19
q20
q21
q22
q23
Sarang Dharmapurikar
Bloom Filter
X
1
1
1
1
1
m-bit Array
H1
H2
H3
H4
Hk
Bloom Filter
Sarang Dharmapurikar
Bloom Filter
Y
1
1
1
1
1
m-bit Array
1
1
1
H1
H2
H3
H4
Hk
Sarang Dharmapurikar
Bloom Filter
X
1
1
1
1
1
m-bit Array
1
1
1
match
H1
H2
H3
H4
Hk
Sarang Dharmapurikar
Bloom Filter
W
1
1
1
1
1
m-bit Array
1
1
1
Match
(false positive)
H1
H2
H3
H4
Hk
Sarang Dharmapurikar
Speed up
t e c h nx y z i c a l l y a b
Sarang Dharmapurikar
Speed up
t e c h nx y z i c a l l y a b
Sarang Dharmapurikar
Bloom filter
BloomFilter
Is x present in the filter?
{No, Yes}
Can be a false positive
But false positive probability is very small…like 0.001
Represents a set of strings
Each string consumes very few bits…like 12 to 16 bits