hyperscan: a fast multi-pattern regex matcher for modern …...graph-based regex decomposition...

Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs

Xiang Wang1, Yang Hong1, Harry Chang1, KyoungSoo Park2, Geoff Langdale3, Jiayu Hu1 and Heqing Zhu1

1 Intel Corporation; 2 KAIST; 3 branchfree.org

Network Platforms Group

…

2

Networking Applications with Regex Matching

• Deep packet inspection (DPI) – key functionality of L7 traffic monitoring

• Regular expression (regex) matching – core element of DPI

• Big problem – regex matching is SLOW

IPS/IDS WAF

…

…

Application Identification

https://www.ukhost4u.com/blog/protecting-your-server-against-malware-with-modsecurity/

https://www.ukhost4u.com/blog/protecting-your-server-against-malware-with-modsecurity/

https://camo.githubusercontent.com/b6c715375794b8db30cff8ef07ea8844c4fcd976/68747470733a2f2f7777772e6e62732d73797374656d2e636f6d2f77702d636f6e74656e742f75706c6f6164732f6e62732d6c6f676f2d6e61787369312e706e67

https://camo.githubusercontent.com/b6c715375794b8db30cff8ef07ea8844c4fcd976/68747470733a2f2f7777772e6e62732d73797374656d2e636f6d2f77702d636f6e74656e742f75706c6f6164732f6e62732d6c6f676f2d6e61787369312e706e67

Network Platforms Group 3

Current Best Practice: Prefilter-based Pattern Matching

/\sSEARCH\s\w+\s\{\d+\}[\r]?\n[^\n]*?%/smi

/^\w+\s+UNSUBSCRIBE\s[^\n]{100}/smi

content:"SEARCH";

pcre:"/\sSEARCH\s\w+\s\{\d+\}[\r]?\n[^\n]*?%/smi"

content:"UNSUBSCRIBE";

pcre:"/^\w+\s+UNSUBSCRIBE\s[^\n]{100}/smi";

Rule 0:

Rule N:

SEARCH

UNSUBSCRIBE

Multi-string matching Single regex matching

Single regex matching

Match!

No Match

…… …

Two-stage Pattern Matching


Problems with Prefilter-based Pattern Matching

.*foo[ˆx]barY+

XfoZbarY

Pattern

InputString Matching for “bar”

fX o Z b a r

Regex Matching

fX o Z b a r Y

Manual choice of improper string keywords

Duplicate matching of the string keywords

content:“/";

pcre:"/(?=[defghilmnoqrstwz])(m(ookflolfctm\x2fnmot\.fmu|clvompycem\x2fcen\.vcn)"

Complex regexes lead to slow NFA

Slow


Contributions

Novel regex decomposition

Solutions

SIMD-based pattern matching

Efficient multi-string matching

Fast bit-based NFA

Issues

Snort: 8.7x Speedup

Multi-string matching: 3.2x Speedup over DFC

Multi-regex matching: 13.5x Speedup over RE2

Outcome

Manual choice of improper string keywords

Duplicate matching of the string keywords

Complex regexes lead to slow NFA

Problems with current best practices

Slow multi-string matching

Slow NFA matching

Suboptimal matching performance


Wide Adoption of Hyperscan

• Successfully deployed by over 40 commercial projects globally

• In production use by tens of thousands of cloud servers in data centers

• Integrated into 37 open-source projects


Regex Decomposition


Decomposition-based Matching

.* [^x] Y+foo bar

FA2 STR2 FA1 FA0STR1.*foo[ˆx]barY+

XfoZbarY

Pattern

Input

fX o Z b a r Y

• No duplicate string keyword matching• Smaller FAs with fast DFA matching• Facilitate multi-regex matching

FA1 is Dead!Don’t trigger FA0

String MatchingFA Matching

• Decomposes a pattern into string (STR) and subregex (FA) components

• String matching is the entrance• All components have to be matched

in order


Key Issues with Regex Decomposition

• How to automatically decompose a regex?

• How many real-world regexes can be decomposed?


Graph-based Regex Decomposition

Glushkov NFA1a

2b

3c

7. 8g

9h

5e

6f

0.4d

10i

• Textual regex decomposition is often tricky, e.g. /b[il1]l\s{0,10}/• Graph structure delivers more insights

(abc|def).*ghiRegex

Graph-based Decomposition1) Dominant Path Analysis2) Dominant Region Analysis3) Network Flow Analysis

1a

2b

3c

7. 8g

9h

5e

6f

0.4d

10i

FA1STR1

.*abc

ghidefSTR2

STR3


Graph-based String Extraction

Dominant Path Analysis

1[â]

2.

4[â]

6a

7b

8c

3[â]

5d

10f

9[ê]

11c

0.

1[â]

2.

3[â]

6b

9a

12r

3[â]

4d

14[ê]

15c

0.

7a

10b

13c

5f

8o

11o

Dominant Region Analysis


Graph-based String Extraction

Network Flow Analysis• Finds a string (or multiple strings) that ends at the edge• Assigns a score inversely proportional to the length of the string(s) ending at the edge • Runs “max-flow min-cut” algorithm [1] to find a minimum cut-set

2[â]

3.

6[â]

10f

13g

16h

5[â]

8.

19[^c]

22.

0.

11a

14b

17c

9f

12o

15o

7e

4.

20[^m]

18[ê]

21c

1[â]

23g

[1]Jack Edmonds and Richard M Karp. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM, 19(2):248–264, 1972.


Effectiveness of Graph Analysis on Real-world Rules

Ruleset Total All Graph Analyses

DominantPath

DominantRegion

Network Flow

Snort Talos(May 2015)

1663 94.0% 93.3% 1.9% 1.0%

Snort ET-open 2.9.0 7564 89.3% 86.9% 1.3% 2.7%

Suricata 4.0.4 7430 87.5% 85.0% 1.3% 2.7%

Majority of Regex Rules are Decomposable

Dominant Path Analysis is Effective


Quality of Automatically Extracted Keywords

16

38.5

391.2

524.5 520.6

697.7

0

100

200

300

400

500

600

700

800

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

700 850 1000 1150 1300

# of Patterns

Snort Talos*

Prefilter-based Hyperscan Reduction

4.6

131139.2

179.1 182.4

0

40

80

120

160

200

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

1.E+09

500 1000 1500 2000 2500

# of Patterns

Snort ET-Open*

Prefilter-based Hyperscan Reduction

* Left vertical axis: # of regex matching process invocations (In logarithmic scale based on 10)* Right vertical axis: reduction of Hyperscan


SIMD-based Pattern Matching


How to Accelerate Pattern Matching Algorithms?

• Modern CPUs support SIMD (Single Instructions Multiple Data) to exploit data level parallelism

• SIMD instructions can boost database pattern matching by 2x [1]

• Accelerates both multi-string and FA matching with SIMD as the goal

[1] E. Sitaridi, O. Polychroniou, and K. A. Ross. SIMD-accelerated regular expression matching. In Proceedings of the Workshop on Data Management on New Hardware (DaMoN), 2016

X2 X1 X0X3

Y2 Y1 Y0Y3

X2 OP Y2 X1 OP Y1 X0 OP Y0X3 OP Y3

OP

SIMD Register X

SIMD Register Z

SIMD Register Y

OP OP OP


Multi-string Pattern Matching Overview

• Extended shift-or matching− Finds candidate input strings that are likely to match some string patterns

• Verification− Filters false positives with hashing− Confirms an exact match with string patterns with the same hash value

Multi-string Shift-or Matching

Verification

Exact Matching

Candidate Matching Input

String Pattern

…

Hashing

Input Traffic


Shift-or String Matching

sh-mask(‘h’)

sh-mask(‘a’)

sh-mask(‘p’)

aphp

lowhigh

string pattern

aphp…Input

st-mask 11111111

11111110

11111011

11110101

11111110

m1 (st-mask << 1) sh-mask(‘a’)

11111110 11111110=Limitations:− Single string pattern

matching only− Cannot benefit from

SIMD instructions

OR

m2 (m1 << 1) sh-mask(‘p’)

11111101 11111100 11110101= OR

m3 (m2 << 1) sh-mask(‘h’)

11111011 1111101111111010= OR

m4 (m3 << 1) sh-mask(‘p’)

Match!

11110111 1111010111110110= OR

[1] Ricardo A. Baeza-Yates and Gaston H. Gonnet. A new approach to text searching. Communications of the ACM (CACM), 35(10):74–82, 1992



• Pattern grouping: Groups the patterns into N buckets• SIMD acceleration: Uses 128-bit sh-masks with 128-bit SIMD instructions (e.g., pslldq

for "left shift“ and por for "or")

… 11111110

sh-mask(‘b’)

sh-mask(‘a’)

sh-mask(‘c’)

low

sh-mask(‘d’)

Padding Bytes

high

11111110 11111111

11111111 11111110

11111110 11111111

11111111 11111110

… 11111110

… 11111110

… 11111110

ab

cd

ab

Bucket 0

cd

…

Bucket 1 Bucket 2 Bucket N



hp

aphp…

sh-mask(‘a’)

aphp

sh-mask(‘p’) << 24

sh-mask(‘h’) << 16

sh-mask(‘p’) << 8

sh-mask(‘a’)

st-mask

lowhigh

Bucket 4

Bucket 0

Input

… 11101110 11111110 11111111 11111111

… 11111110 11111110 11101110 11111111

… 11111110 11101110 11111111 11101110

… 00000000 00000000 00000000 11111111

… 11101110 11111111 11101110 00000000

… 11101110 11111111 00000000 00000000

… 11101110 00000000 00000000 00000000

sh-mask(‘h’)

sh-mask(‘p’)

Match! (bucket = 0, position = 3)Match! (bucket = 4, position = 3)

… 11101110 11111110 11111111 11111111

OR

11101110

Pre-shifting the sh-masks increasesinstructions per cycle (IPC)!

128-bit SIMD operations increase throughput!


Bit-based NFA Matching

• Uses DFA as much as possible – but often impossible

• Classic NFA is slow - O(m) memory lookups per input character (m = # of current states)

• Represents each state with one bit in a state bit-vector

• Exploits parallel bit operations of SIMD to compute the next states -3

1A

4D

5A

6F

7F

0.

3C

2B

-1 -13

5

3


Other Subsystems

Small string-set (<80) matching

NFA and DFA cyclic state acceleration

Small-size DFA matching

Anchored pattern matching

Suppression of futile FA matching

…


Evaluation


Evaluation of Hyperscan

• Primary evaluation points:

1. Performance of string matching and regex matching vs. state-of-the-art solutions

2. Application-level performance improvement with Hyperscan

• Experiment setup:

– Machine: Intel Xeon Platinum 8180 CPU @ 2.50GHz (48 GB of RAM)

Runs with a single core

GCC 5.4

– Ruleset: Snort Talos (May 2015), Snort ET-Open 2.9.0, Suricata rulesets 4.0.4

– Workload: random traffic, real-world web traffic

26


Multi-String Matching Performance with Snort ET-Open

3.2

1.3 1.2 1.1

0

1

2

3

4

0

3

6

9

12

15

1k 5k 10k 26k

Thro

ugh

pu

t (G

bp

s)

Number of String Patterns1

2.5

2.1

1.71.5

0

1

2

3

0

1

2

3

4

5

6

7

8

9

1k 5k 10k 26k

Thro

ugh

pu

t (G

bp

s)

Number of String Patterns2

1 Random workload.2 Real web traffic trace.


Regex Matching Performance

28

183.3

6.913.5 8.4

0

40

80

120

160

200

Talos ET-Open

Spee

d-u

p b

y H

yper

scan

Multiple Regex Matching*

* Test with Snort Talos (1,300 regexes) and ET-Open (2,800 regexes) rulesets under real Web traffic trace.

40.1

24.8

10.3 9.1

2.3 1.8

0

10

20

30

40

50

Talos ET-Open

Spee

d-u

p b

y H

yper

scan

Single Regex Matching*

PCRE RE2 PCRE2


Real-world DPI Application - Snort

• Stock Snort (ST-Snort) employs− AC for multi-string matching− PCRE for regex matching− Boyer-Moore algorithm single-string

matcher

• Hyperscan-ported Snort (HS-Snort) replaced all the algorithms with Hyperscan

• Snort Talos (May 2015) with real-world web traffic

8.37x

113

986

0

200

400

600

800

1000

1200

Thro

ugh

pu

t(M

bp

s)

Snort Performance

ST-Snort HS-Snort


Conclusion

• Regex matching is at the core of DPI applications

• Hyperscan’s performance advantage is boosted by:

− Novel regex decomposition

− Efficient multi-string matching and bit-based NFA implementation

• Hyperscan achieves significant performance boosts

− 3.2x compared to DFC in multi-string matching

− 13.5x compared to RE2 in regex matching

• Hyperscan accelerates DPI application Snort by 8.37x


Thank You

• Thanks Matt Barr, Alex Coyte and Justin Viiret for their development contribution

• Source code at https://github.com/intel/hyperscan

https://github.com/intel/hyperscan

hyperscan: a fast multi-pattern regex matcher for modern …...graph-based regex decomposition...

Documents