application of bloom filters for longest prefix matching and string matching sarang dharmapurikar...

Application of Bloom Filters for Longest Prefix Matching and String Matching

Sarang Dharmapurikar

With contributions from : Praveen Krishnamurthy,

David Taylor, Todd Sproull and John Lockwood


Agenda

● Background on Bloom filters

● Application to Longest Prefix Matching

● Application to String Matching

● Snort on Chip (if the time permits)


Bloom Filter

X

1

1

1

1

1

m-bit Array

H1

H2

H3

H4

Hk

Bloom Filter


Bloom Filter

Y

1

1

1

1

1

m-bit Array

1

1

1

H1

H2

H3

H4

Hk


Bloom Filter

X

1

1

1

1

1

m-bit Array

1

1

1

match

H1

H2

H3

H4

Hk


Bloom Filter

W

1

1

1

1

1

m-bit Array

1

1

1

Match

(false positive)

H1

H2

H3

H4

Hk


Optimal Parameters of a Bloom filter

● n : number of messages to be stored● k : number of hash functions● m : the size of the bit-array

(memory)

● The false positive probability

f = (½)k

● The optimal value of hash functions, k, is k = ln2 × m/n = 0.693 × m/n

Y

1

1

1

1

1

m-bit Array

1

1

1

H1

H2

H3

H4

Hk

Key Point : False positive probability decreases exponentially with linear increase in the number of hash functions & memory

Longest Prefix Matching

“Longest Prefix Matching Using Bloom Filters”Sarang Dharmapurikar, Praveen Krishnamurthy, David E. Taylor

SIGCOMM 2003


Motivation● Router speed depends on Longest Prefix Matching (LPM)

IP lookup requires LPMo Find the longest matching prefix of destination IP address in the

routing table and retrieve the next-hop

● Algorithmic approaches

Controlled Prefix Expansion (CPE) and variants

Lulea, Tree Bitmap

Binary search on prefix lengths

Low power and low costo Memory accesses are a bottleneck

● Device based approach TCAM : more power and more cost compared to SRAM


Desirable Features for LPM

● High speed

OC-768 => 125 million lookups per second

● Low power

● Low cost

● Feasible to implement

● Fast incremental route updates

● Scalable with IP address length (for IPv6)



0* 31* 401* 610* 1311* 22010* 7011* 12101* 560010* 41001* 51010* 13

1011011010100101001111111111111* 710110110101010111101001111111111 511011110111101010111111111111111 13

10101110110110110110100010110111

Destination IP address prefix Next hop



10101110110110110110100010110111

Destination IP address

1 2 3 4 21 2420 25 32

0* 31* 401* 610* 1311* 22010* 7011* 12101* 560010* 41001* 51010* 13

1011011010100101001111111111111* 710110110101010111101001111111111 511011110111101010111111111111111 13

prefix Next hop



10101110110110110110100010110111


0* 3

101* 56

0010* 4

10110110101010111101001111111111 5

11011110111101010111111111111111 13

01* 6

1010* 13

011* 12010* 7

1011011010100101001111111111111* 7

1001* 5

11* 22

10* 13

1* 4

Prefix Next hop

Hash Table

21 2420 25 321 2 3 4



10101110110110110110100010110111


0* 3

101* 56

0010* 4

10110110101010111101001111111111 5

11011110111101010111111111111111 13

01* 6

1010* 13

011* 12010* 7

1011011010100101001111111111111* 7

1001* 5

11* 22

10* 13

1* 4

Prefix Next hop

Hash Table



10101110110110110110100010110111


0* 3

101* 56

0010* 4

10110110101010111101001111111111 5

11011110111101010111111111111111 13

01* 6

1010* 13

011* 12010* 7

1011011010100101001111111111111* 7

1001* 5

11* 22

10* 13

1* 4

Prefix Next hop

Hash Table

Hash

Function


System Overview

HashTable

Interface

Bloom filters


Next Hop

PriorityEncoder

Prefix Next Hop

Hash Table


Effect of False Positives

● On an average, one Bloom filter makes f false hash probes per lookup

● B Bloom filters make Bf false hash probes● One additional true hash probe required for

route lookup● Expected hash probes per lookup

Eexp ≤ Bf + 1

● Worst case hash probes per lookup Eworst = B + 1


Tuning False Positive

● Uniform false positive probability for all the Bloom filters required

● Same number of hash functions in all Bloom filters

● Hence,

● Thus,

2ln

i

i

n

mk

N

M

n

m

n

m

n

m

n

m

i

i

32

32

2

2

1

1 .....

2lnN

Mk

2ln

2

1 N

M

f


Basic Configuration

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

1 1.5 2 2.5 3 3.5 4

Exp

ect

ed

# o

f h

ash

pro

be

s p

er

loo

kup

Size of embedded memory (MBits)

100000 prefixes150000 prefixes200000 prefixes250000 prefixes

12

132

2ln

exp

N

M

E

32worstE

Less than 2 hash probes per lookup

Close to 1 hash probe per lookup


Direct Lookup Array

● Expand prefixes of length 1-19 into 20 bit prefixes

● Direct lookup on 20 bit prefixes in a table

Needs 220

entries in the off-chip memory

● Eliminate first 20 Bloom filters

Less prefixes to store in Bloom filters

21 241 2 3 4 20 25 32

12 Bloom filters

101* 56

11011110111101010111111* 13

01* 6

1010* 13

011* 12010* 7

10110110101001010011111* 7

11* 22

3

4

5

13

57

4

Direct Lookup Table

220 entries

Hash

Table


Direct Lookup Array

● N20 = original number of prefixes of length up to 20 bits

● Typically N20 = 21 % of N

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

1 1.5 2 2.5 3 3.5 4

Exp

ecte

d #

of h

ash

prob

es p

er lo

okup



12

112

2ln

exp

20

NN

M

E

13112 worstE

Less than 1.1 hash probes per lookup


Controlled Prefix Expansion

● Expand prefixes of length 21-23 into 24 bits

● Expand prefixes of length 25-31 into 32 bits

● Off-chip Hash table contains only 24 bit and 32 bit prefixes

21 24 3225

12 Bloom filters

101010001010010011101* 56

11011110111101010111111* 13

010100100001001001001* 6

101000100110101010010* 13

01100100111001101010010* 12010001010010010100001* 7

10110110101001010011111* 7

111101001101000111110* 22

3

4

5

13

57

4

Direct Lookup Table

220 entries

Hash

Table


Controlled Prefix Expansion

● N32 = original 25-32 bit prefixes 0.2% of the total

● α32 = Expansion factor for 25-32 bit prefix expansion Typically α32 = 50

● N24 = original 21-24 bit prefixes 75.2% of the total

● α24 = Expansion factor for 21-24 bit prefix expansion Typically α24 = 1.8

1

1.1

1.2

1.3

1.4

1.5

1.6

1 1.5 2 2.5 3 3.5 4

Exp

ecte

d #

of h

ash

prob

es p

er lo

okup



12

12

2ln

exp

24243232

NN

M

E

312 worstE

Less than 1.2 hash probes per lookup


Simulation results

● Scheme 1: 32 Bloom Filters

● Scheme 2: 12 Bloom filters and Direct Lookup Array

● Scheme 3: 2 Bloom filters and Direct Lookup Array

Theoretical Observed

Eexp1.007670 1.007390

Eworst32 3


Eexp1.000204 1.000898

Eworst13 3


Eexp1.006005 1.003265

Eworst3 3

• 15 IPv4 BGP tables• Avg. of N =115,000 prefixes• 2 Mbits of embedded RAM considered


Hardware Implementation Consideration

● Supporting multiple hash functions in memory

The number of hash functions in each Bloom filter can be 10 to 20

Each hash function requires one memory port for a random lookup

How to support so many ports on memory?

● Use multiple memory cores

Restricts the range of hash functions to a memory insignificant effect on false positive probability


Future Work: Handling Worst Case

● Worst case can be a problem particularly for IPv6● Hybrid schemes involving TCAM and Bloom filters?

Worst case is due to the number of unique Bloom filter Reduce the number of Bloom filters by:

o Using TCAM for prefix lengths with fewer prefixeso And/Or using CPE

Reduce the false positive probability of the individual Bloom filter to “almost” zero!o For Bloom filter of prefix i, use the # hash functions ki = io Hence # of false matches possible = (1/2)i x 2i =1o # bits per prefix = i/ln2 = 1.44i < 2i = # TCAM bits per prefixo When Bloom filter requires too many hash functions, use less hash

functions but more memory to achieve the desired effect Or, maintain TCAM cache for the prefixes that match multiple

Bloom filters

String Matching

“Deep Packet Inspection Using Parallel Bloom Filters”Sarang Dharmapurikar, Praveen Krishnamurthy, Todd Sproull, John Lockwood

Hot Interconnects 2003


Motivation

● Application of deep packet inspection Detection of Internet worms, computer viruses,

SPAM, copyrighted material Layer-7 switching Content classification

● String detection mechanism is a common infrastructure

● Some desirable features of the mechanism String matching at line speed Ability do detect strings at random locations in the payload Ability to detect 1000s of strings Easy incremental updates to the string database Low power and low cost


Using Bloom filters for String Matching

False Positives Resolver

BFW BF5 BF4 BF3

Hash Table

Entering byte bW - - - - - - - - - b5 b4 b3 b2 b1 Leaving byte

BF4


Single Bloom Filter Engine

False Positives Resolver

BFW BF5 BF4 BF3

Hash Table


Using Multiple Engines

Hash Table

Second Level Arbitration


System Throughput● G : Number of engines (4)

● B : Number of Bloom filters in each engine (32)

● f : False-positive probability of each bloom filter

● F : Clock frequency (100 MHz)

Conservative for existing FPGAs, SRAMs & SDRAMs

● : Time required to probe the hash table

20 clock cycles to read a burst of 32 bytes from SDRAM

● p : Frequency at which a true signature appears Typically: 1/1000 - 1/100

sbytes

FppGBf

Gthroughput /

1)1(

Time spent in resolving false

positives with off-chip memory

Time spent in confirmation of a true match

Time spent examining window of packets on-chip

f = (½)(ln2*M/N)

N = 10,000


Throughput as function of on-chip memory size

0

0.5

1

1.5

2

2.5

3

3.5

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Th

rou

gh

pu

t o

f th

e s

yste

m (

in G

iga

bits

pe

r se

con

d)

On-chip memory available for Bloom filters (in Megabits)

p=0.001p=0.01

Throughput not limited by false positives

Probability of

occurrence of a true

match per 100 or 1000 characters

Size of on-chip SRAM

System Throughput

Throughput more than

OC-48


Implementation of Bloom filter on FPGA

m = 5 x 4096 = 20480k = 5 x 2 = 10k = (m/n)ln2 =>n = 1419f = (1/2)10 =

0.0009765

Hash

Value

Generator

X

1

1

1

1

1

m-bit vector

1

1

1

Hash

Value

Generator

Block RAMs

FPGA Implementation

40

96

bit

s

1-bit

2 ad

dre

ss p

ort

s

2 d

ata

po

rts

Dual port Block RAM


Instantiation of mini-Bloom filters

Hash

Value

Generator

Block RAMs

mini-Bloom filter

Wrapper

Array of mini-Bloom filters

distributes the strings uniformly over a set of mini-Bloom filters

Can support 10,000 strings with false positive probability of 0.00097, using 35 on-chip Block RAMs


Partial Bloom Filter

addrA

weA

dinA

40

96 b

its

doutA

doutB

addrB

weB

dinB

‘0’

HashValue

Calculator

H1(X)

H2(X)

X

Output

(match/no match)

1 bit

RequestDecoder

BRAM #

Address

Bit

Valid PBF


Bloom Filter

PBF 2

PBF 3

PBF 4

PBF 5

PBF 1

H1

H2

H3

H4

H5

H6

H7

H8

H9

H10

Match

Control Interface

HashValue

Calculator

X


System Overview

ControlPacketProcessor Bloom

Filter

Hash Table Interface

SDRAM Controller

SDRAM

Protocol Wrappers

Controller Controller

Input Output

application of bloom filters for longest prefix matching and string matching sarang dharmapurikar...

Documents

ipv6 slide

sram slide

john lockwood slide

bloom filter n

prefix lengths low power

bloom filters application

application of bloom

string matching snort