application of bloom filters for longest prefix matching and string matching sarang dharmapurikar...
TRANSCRIPT
Application of Bloom Filters for Longest Prefix Matching and String Matching
Sarang Dharmapurikar
With contributions from : Praveen Krishnamurthy,
David Taylor, Todd Sproull and John Lockwood
Sarang Dharmapurikar
Agenda
● Background on Bloom filters
● Application to Longest Prefix Matching
● Application to String Matching
● Snort on Chip (if the time permits)
Sarang Dharmapurikar
Bloom Filter
X
1
1
1
1
1
m-bit Array
H1
H2
H3
H4
Hk
Bloom Filter
Sarang Dharmapurikar
Bloom Filter
Y
1
1
1
1
1
m-bit Array
1
1
1
H1
H2
H3
H4
Hk
Sarang Dharmapurikar
Bloom Filter
X
1
1
1
1
1
m-bit Array
1
1
1
match
H1
H2
H3
H4
Hk
Sarang Dharmapurikar
Bloom Filter
W
1
1
1
1
1
m-bit Array
1
1
1
Match
(false positive)
H1
H2
H3
H4
Hk
Sarang Dharmapurikar
Optimal Parameters of a Bloom filter
● n : number of messages to be stored● k : number of hash functions● m : the size of the bit-array
(memory)
● The false positive probability
f = (½)k
● The optimal value of hash functions, k, is k = ln2 × m/n = 0.693 × m/n
Y
1
1
1
1
1
m-bit Array
1
1
1
H1
H2
H3
H4
Hk
Key Point : False positive probability decreases exponentially with linear increase in the number of hash functions & memory
Longest Prefix Matching
“Longest Prefix Matching Using Bloom Filters”Sarang Dharmapurikar, Praveen Krishnamurthy, David E. Taylor
SIGCOMM 2003
Sarang Dharmapurikar
Motivation● Router speed depends on Longest Prefix Matching (LPM)
IP lookup requires LPMo Find the longest matching prefix of destination IP address in the
routing table and retrieve the next-hop
● Algorithmic approaches
Controlled Prefix Expansion (CPE) and variants
Lulea, Tree Bitmap
Binary search on prefix lengths
Low power and low costo Memory accesses are a bottleneck
● Device based approach TCAM : more power and more cost compared to SRAM
Sarang Dharmapurikar
Desirable Features for LPM
● High speed
OC-768 => 125 million lookups per second
● Low power
● Low cost
● Feasible to implement
● Fast incremental route updates
● Scalable with IP address length (for IPv6)
Sarang Dharmapurikar
Longest Prefix Matching
0* 31* 401* 610* 1311* 22010* 7011* 12101* 560010* 41001* 51010* 13
1011011010100101001111111111111* 710110110101010111101001111111111 511011110111101010111111111111111 13
10101110110110110110100010110111
Destination IP address prefix Next hop
Sarang Dharmapurikar
Longest Prefix Matching
10101110110110110110100010110111
Destination IP address
1 2 3 4 21 2420 25 32
0* 31* 401* 610* 1311* 22010* 7011* 12101* 560010* 41001* 51010* 13
1011011010100101001111111111111* 710110110101010111101001111111111 511011110111101010111111111111111 13
prefix Next hop
Sarang Dharmapurikar
Longest Prefix Matching
10101110110110110110100010110111
Destination IP address
0* 3
101* 56
0010* 4
10110110101010111101001111111111 5
11011110111101010111111111111111 13
01* 6
1010* 13
011* 12010* 7
1011011010100101001111111111111* 7
1001* 5
11* 22
10* 13
1* 4
Prefix Next hop
Hash Table
21 2420 25 321 2 3 4
Sarang Dharmapurikar
Longest Prefix Matching
10101110110110110110100010110111
Destination IP address
0* 3
101* 56
0010* 4
10110110101010111101001111111111 5
11011110111101010111111111111111 13
01* 6
1010* 13
011* 12010* 7
1011011010100101001111111111111* 7
1001* 5
11* 22
10* 13
1* 4
Prefix Next hop
Hash Table
Sarang Dharmapurikar
Longest Prefix Matching
10101110110110110110100010110111
Destination IP address
0* 3
101* 56
0010* 4
10110110101010111101001111111111 5
11011110111101010111111111111111 13
01* 6
1010* 13
011* 12010* 7
1011011010100101001111111111111* 7
1001* 5
11* 22
10* 13
1* 4
Prefix Next hop
Hash Table
Hash
Function
Sarang Dharmapurikar
System Overview
HashTable
Interface
Bloom filters
Destination IP address
Next Hop
PriorityEncoder
Prefix Next Hop
Hash Table
Sarang Dharmapurikar
Effect of False Positives
● On an average, one Bloom filter makes f false hash probes per lookup
● B Bloom filters make Bf false hash probes● One additional true hash probe required for
route lookup● Expected hash probes per lookup
Eexp ≤ Bf + 1
● Worst case hash probes per lookup Eworst = B + 1
Sarang Dharmapurikar
Tuning False Positive
● Uniform false positive probability for all the Bloom filters required
● Same number of hash functions in all Bloom filters
● Hence,
● Thus,
2ln
i
i
n
mk
N
M
n
m
n
m
n
m
n
m
i
i
32
32
2
2
1
1 .....
2lnN
Mk
2ln
2
1 N
M
f
Sarang Dharmapurikar
Basic Configuration
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
1 1.5 2 2.5 3 3.5 4
Exp
ect
ed
# o
f h
ash
pro
be
s p
er
loo
kup
Size of embedded memory (MBits)
100000 prefixes150000 prefixes200000 prefixes250000 prefixes
12
132
2ln
exp
N
M
E
32worstE
Less than 2 hash probes per lookup
Close to 1 hash probe per lookup
Sarang Dharmapurikar
Direct Lookup Array
● Expand prefixes of length 1-19 into 20 bit prefixes
● Direct lookup on 20 bit prefixes in a table
Needs 220
entries in the off-chip memory
● Eliminate first 20 Bloom filters
Less prefixes to store in Bloom filters
21 241 2 3 4 20 25 32
12 Bloom filters
101* 56
11011110111101010111111* 13
01* 6
1010* 13
011* 12010* 7
10110110101001010011111* 7
11* 22
3
4
5
13
57
4
Direct Lookup Table
220 entries
Hash
Table
Sarang Dharmapurikar
Direct Lookup Array
● N20 = original number of prefixes of length up to 20 bits
● Typically N20 = 21 % of N
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
1 1.5 2 2.5 3 3.5 4
Exp
ecte
d #
of h
ash
prob
es p
er lo
okup
Size of embedded memory (MBits)
100000 prefixes150000 prefixes200000 prefixes250000 prefixes
12
112
2ln
exp
20
NN
M
E
13112 worstE
Less than 1.1 hash probes per lookup
Sarang Dharmapurikar
Controlled Prefix Expansion
● Expand prefixes of length 21-23 into 24 bits
● Expand prefixes of length 25-31 into 32 bits
● Off-chip Hash table contains only 24 bit and 32 bit prefixes
21 24 3225
12 Bloom filters
101010001010010011101* 56
11011110111101010111111* 13
010100100001001001001* 6
101000100110101010010* 13
01100100111001101010010* 12010001010010010100001* 7
10110110101001010011111* 7
111101001101000111110* 22
3
4
5
13
57
4
Direct Lookup Table
220 entries
Hash
Table
Sarang Dharmapurikar
Controlled Prefix Expansion
● N32 = original 25-32 bit prefixes 0.2% of the total
● α32 = Expansion factor for 25-32 bit prefix expansion Typically α32 = 50
● N24 = original 21-24 bit prefixes 75.2% of the total
● α24 = Expansion factor for 21-24 bit prefix expansion Typically α24 = 1.8
1
1.1
1.2
1.3
1.4
1.5
1.6
1 1.5 2 2.5 3 3.5 4
Exp
ecte
d #
of h
ash
prob
es p
er lo
okup
Size of embedded memory (MBits)
100000 prefixes150000 prefixes200000 prefixes250000 prefixes
12
12
2ln
exp
24243232
NN
M
E
312 worstE
Less than 1.2 hash probes per lookup
Sarang Dharmapurikar
Simulation results
● Scheme 1: 32 Bloom Filters
● Scheme 2: 12 Bloom filters and Direct Lookup Array
● Scheme 3: 2 Bloom filters and Direct Lookup Array
Theoretical Observed
Eexp1.007670 1.007390
Eworst32 3
Theoretical Observed
Eexp1.000204 1.000898
Eworst13 3
Theoretical Observed
Eexp1.006005 1.003265
Eworst3 3
• 15 IPv4 BGP tables• Avg. of N =115,000 prefixes• 2 Mbits of embedded RAM considered
Sarang Dharmapurikar
Hardware Implementation Consideration
● Supporting multiple hash functions in memory
The number of hash functions in each Bloom filter can be 10 to 20
Each hash function requires one memory port for a random lookup
How to support so many ports on memory?
● Use multiple memory cores
Restricts the range of hash functions to a memory insignificant effect on false positive probability
Sarang Dharmapurikar
Future Work: Handling Worst Case
● Worst case can be a problem particularly for IPv6● Hybrid schemes involving TCAM and Bloom filters?
Worst case is due to the number of unique Bloom filter Reduce the number of Bloom filters by:
o Using TCAM for prefix lengths with fewer prefixeso And/Or using CPE
Reduce the false positive probability of the individual Bloom filter to “almost” zero!o For Bloom filter of prefix i, use the # hash functions ki = io Hence # of false matches possible = (1/2)i x 2i =1o # bits per prefix = i/ln2 = 1.44i < 2i = # TCAM bits per prefixo When Bloom filter requires too many hash functions, use less hash
functions but more memory to achieve the desired effect Or, maintain TCAM cache for the prefixes that match multiple
Bloom filters
String Matching
“Deep Packet Inspection Using Parallel Bloom Filters”Sarang Dharmapurikar, Praveen Krishnamurthy, Todd Sproull, John Lockwood
Hot Interconnects 2003
Sarang Dharmapurikar
Motivation
● Application of deep packet inspection Detection of Internet worms, computer viruses,
SPAM, copyrighted material Layer-7 switching Content classification
● String detection mechanism is a common infrastructure
● Some desirable features of the mechanism String matching at line speed Ability do detect strings at random locations in the payload Ability to detect 1000s of strings Easy incremental updates to the string database Low power and low cost
Sarang Dharmapurikar
Using Bloom filters for String Matching
False Positives Resolver
BFW BF5 BF4 BF3
Hash Table
Entering byte bW - - - - - - - - - b5 b4 b3 b2 b1 Leaving byte
BF4
Sarang Dharmapurikar
Single Bloom Filter Engine
False Positives Resolver
BFW BF5 BF4 BF3
Hash Table
Sarang Dharmapurikar
Using Multiple Engines
Hash Table
Second Level Arbitration
Sarang Dharmapurikar
System Throughput● G : Number of engines (4)
● B : Number of Bloom filters in each engine (32)
● f : False-positive probability of each bloom filter
● F : Clock frequency (100 MHz)
Conservative for existing FPGAs, SRAMs & SDRAMs
● : Time required to probe the hash table
20 clock cycles to read a burst of 32 bytes from SDRAM
● p : Frequency at which a true signature appears Typically: 1/1000 - 1/100
sbytes
FppGBf
Gthroughput /
1)1(
Time spent in resolving false
positives with off-chip memory
Time spent in confirmation of a true match
Time spent examining window of packets on-chip
f = (½)(ln2*M/N)
N = 10,000
Sarang Dharmapurikar
Throughput as function of on-chip memory size
0
0.5
1
1.5
2
2.5
3
3.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Th
rou
gh
pu
t o
f th
e s
yste
m (
in G
iga
bits
pe
r se
con
d)
On-chip memory available for Bloom filters (in Megabits)
p=0.001p=0.01
Throughput not limited by false positives
Probability of
occurrence of a true
match per 100 or 1000 characters
Size of on-chip SRAM
System Throughput
Throughput more than
OC-48
Sarang Dharmapurikar
Implementation of Bloom filter on FPGA
m = 5 x 4096 = 20480k = 5 x 2 = 10k = (m/n)ln2 =>n = 1419f = (1/2)10 =
0.0009765
Hash
Value
Generator
X
1
1
1
1
1
m-bit vector
1
1
1
Hash
Value
Generator
Block RAMs
FPGA Implementation
40
96
bit
s
1-bit
2 ad
dre
ss p
ort
s
2 d
ata
po
rts
Dual port Block RAM
Sarang Dharmapurikar
Instantiation of mini-Bloom filters
Hash
Value
Generator
Block RAMs
mini-Bloom filter
Wrapper
Array of mini-Bloom filters
distributes the strings uniformly over a set of mini-Bloom filters
Can support 10,000 strings with false positive probability of 0.00097, using 35 on-chip Block RAMs
Sarang Dharmapurikar
Partial Bloom Filter
addrA
weA
dinA
40
96 b
its
doutA
doutB
addrB
weB
dinB
‘0’
HashValue
Calculator
H1(X)
H2(X)
X
Output
(match/no match)
1 bit
RequestDecoder
BRAM #
Address
Bit
Valid PBF
Sarang Dharmapurikar
Bloom Filter
PBF 2
PBF 3
PBF 4
PBF 5
PBF 1
H1
H2
H3
H4
H5
H6
H7
H8
H9
H10
Match
Control Interface
HashValue
Calculator
X
Sarang Dharmapurikar
System Overview
ControlPacketProcessor Bloom
Filter
Hash Table Interface
SDRAM Controller
SDRAM
Protocol Wrappers
Controller Controller
Input Output