gigabit rate packet pattern-matching using tcam
DESCRIPTION
Gigabit Rate Packet Pattern-Matching Using TCAM. Fang Yu, Randy H. Katz T. V. Lakshman UC Berkeley Bell Labs, Lucent ICNP’2004. Motivation. Malicious probes and worms spread Solutions: End-host based Anti-virus software, security patches - PowerPoint PPT PresentationTRANSCRIPT
Reviewer: Jing Lu
Gigabit Rate Packet Pattern-Matching
Using TCAMFang Yu, Randy H. Katz T. V. Lakshman
UC Berkeley Bell Labs, Lucent
ICNP’2004
Motivation • Malicious probes and worms spread
• Solutions:• End-host based
• Anti-virus software, security patches• Ineffective and costly
• Network based• Network Intrusion Detection Systems (NIDS)• Payload processing for thousands of complicated content patterns at line speed• Fast and scalable multi-pattern matching schemes are highly needed
Current Pattern Matching Schemes• Software based solutions
• Low speed
• FPGA base solutions• Do not scale well in terms of space or overall latency for large number of patterns
• Bloom filters• Able to handle thousands of patterns• Build a bloom filter for each possible pattern length• Hard to handle hundreds of possible pattern lengths
Problem Definition• Pattern matching problem
Given: a set of k patterns {P1, P2, …, Pk}, k >= 1, and a packet of length n;Goal: find all the matching patterns in the packet.
• Simple patterns:• Deterministic form: specific value of the 256 values• Non-deterministic form:
• Case insensitive alphabet• wildcard byte (*)
• Composite patterns:• Negation(!)• Correlated patterns
TCAM
• Three logic states: ‘0’, ‘1’, ‘?’• Given an input string, TCAM reports the lowest index match if there are multiple matches• 4 ns lookup time• Single-chip density ~ 2MB• Width of each entry is configurable
Simple Pattern Matching Using TCAM
• Short patterns: length <= TCAM width w• Pad with ‘?’ if less than w• Organize patterns according to lengths in descending order• Input packet shift one byte at a time
• Throughput: 2GbpsA B C D E F
C D E F
A B ? ?
MatchA B C ?
Input
TCAM
A B C D E F
C D E F
A B ? ?
A B C ?
Input
TCAM
Simple Pattern Matching Using TCAM
• Long patterns: length > TCAM width w• Divide long pattern to multiple short patterns
• Prefix pattern: first w bytes• Suffix patterns: remaining every w bytes. If the last suffix pattern is less than w bytes, pad it in the front with preceding bytes.• Example: DEFGABCDL
DEFG -------------------- prefix pattern ABCD BCDL
------ Suffix patterns
Patterns in TCAMPattern Index Pattern Contents Prefix patterns Suffix patterns
1 ABCDABCD ABCD ABCD
2 DEFGABCDL DEFG ABCD, BCDL
3 DEFGDEF DEFG GDEF
4 DEF
A B C DD E F GB C D L
G D E FD E F ?
TCAM Index
12
345
Data Structures in SRAM
Pattern Index in TCAM
Simple Pattern Index
Prefix Index Suffix Index
1 -1 1 1
2 4 2 -1
3 -1 -1 2
4 -1 -1 3
5 4 -1 -1
• Combined Pattern Table
A B C DD E F GB C D L
G D E FD E F ?
TCAM Index
12
345
Pattern Index
Pattern Contents
Prefix patterns Suffix patterns
1 ABCDABCD ABCD (1) ABCD (1)
2 DEFGABCDL DEFG (2), ABCD (1), BCDL (2)
3 DEFGDEF DEFG (2) GDEF (3)
4 DEF
DEFGABCD (3)
Data Structures in SRAMPattern Index Pattern Contents Prefix patterns Suffix patterns
1 ABCDABCD ABCD (1) ABCD (1)
2 DEFGABCDL DEFG (2), DEFGABCD (3) ABCD (1), BCDL (2)
3 DEFGDEF DEFG (2) GDEF (3)
4 DEF
• Matching TablePrefix Index Suffix Index Distance Matched Long Pattern Index
1 1 4 1
2 1 4 3*
2 3 3 3
3 1 4 1
3 2 1 2
• Partial Hit List (PHL)• Generated during matching process
Algorithm for Long Pattern Matching
Prefix Index
Suffix Index
Dist-ance
Matched Long Pattern Index
1 1 4 1
2 1 4 3*
2 3 3 3
3 1 4 1
3 2 1 2
Pattern Index in TCAM
Simple Pattern Index
Prefix Index
Suffix Index
1 -1 1 1
2 4 2 -1
3 -1 -1 2
4 -1 -1 3
5 4 -1 -1
D E F G A B C D LInput
TCAM
Partial Hit List (PHL)
Position Prefix IndexD E F G
A B C D
B C D L
G D E F
D E F ?
D E F GPosition Prefix Index
1 2D E F G
A B C D
B C D L
G D E F
D E F ?
D E F G
A B C D
B C D L
G D E F
D E F ?
D E F G
A B C D
B C D L
G D E F
D E F ?
D E F G
A B C D
B C D L
G D E F
D E F ?
A B C DPosition Prefix Index
5 3D E F G
A B C D
B C D L
G D E F
D E F ?
B C D L
Matching TableCombined Pattern Table
Composite Pattern Matching• Correlated Patterns
• Partial hit record for sub-patterns kept in PHL because distance between two sub-patterns can be larger than w
• Example: content: “user”; content: “root”; within 20 prefix: user; suffix: root; distance: 4-20 ---- 17 entries in matching table
• Pattern with negations• Usually part of a correlated pattern
• Pattern with wildcards• Distance between upper case character and its corresponding lower case character is 32.
Analysis
wmw i /* 2))1/((* i
i wmw
wi
i wm
)2(
)1/(
8
• What is the impact of TCAM width on the scheme?
TCAM Size Matching Table Size
TCAM Hit Rate
PHL Size
w
ii wm
w)2(
)1/(*
8
* k patterns, mi bytes each, TCAM width w, and random input stream
Analysis• What is the impact of memory lookups on system scan rate?
• Two kinds of memory lookups can be pipelined• With small TCAM hit rate and PHL size, overall scan time is dominated by TCAM lookup time
a aTCAM
Lookuptime
Position
a a a a a a a a
1 2 3 4 5 6 7 8 9 10
MemoryLookup
time
Performing Memory Lookups Idle
hit hit hit miss miss miss miss miss hit
n'
hit
Malicious Attacks?• Correlated patterns can cause problem
• Distance between sub-patterns can be larger than w -- PHL size Backlogged memory lookups Scan rate • Sub-patterns can be short -- Hit rate PHL size Scan rate
• The probability of matching two patterns of 1 byte apart is very small, but packing sub-patterns consecutively to form a long packet can create a large PHL
• Limit max distance between sub-patterns
Simulation Results• Rule sets:
• ClamAV (v0.15) virus signature database• 1768 simple patterns• Average pattern length = 55 bytes• Pattern length: 6 ~ 2189 bytes
• SNORT (v2.1.2)• 1039 simple patterns, 527 correlated patterns• Mostly 10 ~ 100 bytes, some 1 ~ 4 bytes long
• Packet traces:• Real – MIT trace (1M), Berkeley trace (6M)• Synthetic – Randomly insert patterns in packet payload
ClamAV Pattern Set
• w = 128 bytes• TCAM = 240KB• SRAM < 10MB
1
10
100
1000
10000
4 8 16 32 64 128
256
512
1024 TCAM width
(in bytes)
TC
AM
Sp
ace
(KB
)
0
0
1
10
100
1000
10000
Mat
chin
g T
able
Siz
e (M
B)
TCAM Spaces ConsumedMemory Space for Mapping Table
ClamAV Pattern Set
PHL size for ClamAV pattern set with real traces
• Avg PHL: Mean of average PHL size over all packets• AvgMax PHL: Mean of maximum PHL size over all packets• Max: Maximum PHL size in all packets
ClamAV Pattern Set
PHL size for ClamAV pattern set with synthetic traces
0
0.05
0.1
0.15
0.2
0.25
0.3
16 32 64 128 256 512 1024
TCAM width(in bytes)
Ave
rag
e P
HL
Siz
e
1 Pattern/packet
10 Patterns/packet
100 Patterns/packet
0
1
2
3
4
5
16 32 64 128 256 512 1024
TCAM Width
AV
gM
ax P
HL
Siz
e
1 Pattern/packet
10 Patterns/packet
100 Patterns/packet
• SRAM lookup can catch up with the TCAM lookup• Scan rate = 2Gbps
SNORT Pattern Set
PHL size for SNORT pattern set with real traces
Win-dowSize
MIT Dump Berkeley Dump
Avg AvgMax
Max Avg AvgMax
Max
20 0.5523 2.7683 8 0.4702 1.5765 12
40 0.9881 3.5376 14 0.6500 1.8661 18
60 1.3151 3.9960 14 0.7313 1.9652 23
80 1.5491 4.2158 16 0.7587 2.0373 24
100 1.6867 4.3485 18 0.7661 2.0740 25
120 1.7725 4.4475 18 0.7669 2.0768 25
140 1.8308 4.5722 19 0.7669 2.0768 25
160 1.8800 4.6643 19 0.7669 2.0768 25
180 1.9244 4.7386 19 0.7669 2.0768 25
200 1.9662 4.8079 20 0.7669 2.0768 25
• w = 128, TCAM size = 295KB
SNORT Pattern Set• Scan Ratio = Total scan time/Total TCAM lookup time• Memory Ratio = SRAM access time/TCAM access time
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
0.6 0.7 0.8 0.9 1% of Packets
Sc
an
Ra
tio
0.20.40.60.81
Memory Ratio
• Scan rate > 1Gbps
Effects of Memory ratio on scan ratio
Conclusion• A simple multi-pattern matching algorithm using TCAM• Support thousands of patterns with variable lengths• Support long patterns, correlated patterns, pattern with negation and wildcards• Achieve multi-gigabit rate on ClamAV and SNORT pattern sets