c 2013 liu yang all rights reserved - rutgers universityvinodg/students/liuyang_phdthesis.pdf ·...

c© 2013

Liu Yang

ALL RIGHTS RESERVED

NEW PATTERN MATCHING ALGORITHMS FORNETWORK SECURITY APPLICATIONS

BY LIU YANG

A dissertation submitted to the

Graduate School—New Brunswick

Rutgers, The State University of New Jersey

in partial fulfillment of the requirements

for the degree of

Doctor of Philosophy

Graduate Program in Computer Science

Written under the direction of

Vinod Ganapathy

and approved by

New Brunswick, New Jersey

May, 2013

ABSTRACT OF THE DISSERTATION

New Pattern Matching Algorithms for Network Security

Applications

by Liu Yang

Dissertation Director: Vinod Ganapathy

Modern network security applications, such as network-based intrusion detection sys-

tems (NIDS) and firewalls, routinely employ deep packet inspection to identify malicious

traffic. In deep packet inspection, the contents of network packets are matched against

patterns of malicious traffic to identify attack-carrying packets. The pattern matching

algorithms employed for deep packet inspection must satisfy two requirements. First,

the algorithms must be fast. Network security applications are often implemented as

middleboxes that reside on high-speed Gbps links, and the algorithms are expected to

perform at such speeds. Second, the algorithms must be space-efficient. The middle-

boxes that perform pattern matching are often implemented as hardware components,

they employ fast but expensive SRAM technology to ensure good performance.

Unfortunately, existing pattern matching algorithms suffer from a fundamental time-

space tradeoff. The large majority of patterns are regular expressions, and there are

three prior approaches for matching such patterns: deterministic finite automaton

(DFAs), non-deterministic finite automaton (NFAs), and recursive backtracking-based

approaches. DFAs are fast to operate, but are space-inefficient. NFAs are space effi-

cient, but are slow to operate. Recursive backtracking is fast for benign packets but is

vulnerable to attack-carrying packets that can induce algorithmic complexity attacks.

ii

This dissertation proposes novel algorithms for time- and space-efficient pattern

matching that also resist known algorithmic complexity attacks. It presents three con-

tributions. First, it introduces NFA-OBDDs, a new data structure that allows time-

and space-efficient matching of regular expressions. Second, it presents an extension to

NFA-OBDDs that allows them to model submatch extraction, an important feature in

real-world patterns used by network security applications. Finally, it presents a tech-

nique to efficiently match a non-regular pattern language: regular expressions extended

with back-references. This disseration presents experimental results demonstrating that

the new algorithms can beat the performance of existing, widely-deployed algorithms

(such as Google’s RE2 and PCRE) by several orders of magnitude.

iii

Acknowledgements

I would like to express my thanks to:

• my advisor Prof. Vinod Ganapathy, for his insightful advice and support, for

providing me an excellent research environment at Rutgers University.

• Prof. Liviu Iftode, for his co-advising of my research work in mobile security.

• Dr. Markus Jakobsson, for being my first research navigator in computer security

and privacy, for providing me valuable advice to my life and studies in USA.

• Dr. Pratyusa Manadhata, Dr. William Horne, Dr. Prasad Rao, Dr. Randy

Smith, Rezwana Karim, Nader Boushehrinejadmoradi, Pallab Roy, and all my

collaborators for contributing their expertise in my research work.

• Prof. Jie Hu, for her kindly help and encouragement during my studies in USA.

• my colleagues and other members of the Disco-Lab in Computer Science De-

partment at Rutgers University, for their helpful discussion and feedbacks to my

research projects and presentations.

• my friends over the years, for sharing my happiness and concerns.

Thank you so much.

My biggest source of strength and motivation originates from my parents. My

progress in studies and career is due to their endless love, unconditional support, sac-

rifice, and encouragement. My success is their success. Without their affection and

guidance, I would not be able to receive my degrees.

Funding. My graduate studies were funded by NSF grants CNS-0831268, CNS-

0915394, CNS-0931992, CNS-0952128 and CNS-1117711. The Cloud and Security

iv

Group at HP Laboratories (Princeton, NJ) and the Security and Cryptography Group

at Microsoft Research (Redmond, WA) employed me as a research intern in Summer

2011 and Summer 2012, respectively. Their supports are gratefully acknowledged.

v

Dedication

I dedicate this dissertation to my wonderful family. Particularly to my wife, Weiwei,

who has provided me many years of support and understanding to my research work,

and to our lovely son Andy, who is the joy of our life. I must thank my mother-in-law,

who has helped us so much in baby-sitting, and my father-in-law, who has supported

us both financially and emotionally.

vi

Table of Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1. Pattern Matching in Network Security Applications . . . . . . . . . . . . 2

1.2. Existing Approaches for Pattern Matching . . . . . . . . . . . . . . . . . 3

1.2.1. Finite Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Non-deterministic Finite Automata . . . . . . . . . . . . . . . . . 4

Deterministic Finite Automata . . . . . . . . . . . . . . . . . . . 4

1.2.2. Pattern Matching Algorithms . . . . . . . . . . . . . . . . . . . . 5

DFA-based Matching . . . . . . . . . . . . . . . . . . . . . . . . . 5

Thompson’s NFA-based Matching . . . . . . . . . . . . . . . . . 6

Recursive Backtracking-based Matching . . . . . . . . . . . . . . 7

1.3. Challenges in Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.1. State Blow-up of DFAs . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.2. Growth of Pattern Sets . . . . . . . . . . . . . . . . . . . . . . . 12

1.3.3. Slow Operation of NFAs . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.4. Algorithmic Complexity Attacks . . . . . . . . . . . . . . . . . . 14

1.4. Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.5. Contributors to the Dissertation . . . . . . . . . . . . . . . . . . . . . . 16

vii

1.6. Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2. Background: Ordered Binary Decision Diagrams . . . . . . . . . . . . 18

2.1. Definition of OBDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2. Operations in OBDDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3. Representing Relations and Sets . . . . . . . . . . . . . . . . . . . . . . 21

3. Improving NFA-based Pattern Matching using OBDDs . . . . . . . . 23

3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2. Representing and Operating NFAs and NFA-OBDDs . . . . . . . . . . . 26

3.2.1. NFA Operation using Boolean Function Manipulation . . . . . . 27

3.2.2. NFA-OBDDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3. Experimental Apparatus and Data Sets . . . . . . . . . . . . . . . . . . 30

3.3.1. Signature Sets and Network Traffic . . . . . . . . . . . . . . . . . 32

3.3.2. Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4. Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4.1. NFA-OBDDs: Construction and Performance . . . . . . . . . . . 36

3.4.2. Comparison with NFAs . . . . . . . . . . . . . . . . . . . . . . . 37

3.4.3. Comparison with the PCRE Package . . . . . . . . . . . . . . . . 38

3.4.4. Comparison with DFA Variants . . . . . . . . . . . . . . . . . . . 39

Multiple DFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Hybrid Finite Automata . . . . . . . . . . . . . . . . . . . . . . . 42

3.4.5. Deconstructing NFA-OBDD Performance . . . . . . . . . . . . . 42

3.4.6. Impact of Variable Ordering on NFA-OBDD Performance . . . . 43

3.5. Matching Multiple Input Symbols . . . . . . . . . . . . . . . . . . . . . 46

3.5.1. Adapting to the Streaming Model . . . . . . . . . . . . . . . . . 47

3.5.2. Reducing Space Consumption using Alphabet Compression . . . 49

3.5.3. Performance of k-stride NFA-OBDDs . . . . . . . . . . . . . . . 51

3.6. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

viii

4. Fast Submatch Extraction using OBDDs . . . . . . . . . . . . . . . . . 56

4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.2. Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2.1. Solution Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2.2. Tagging NFAs for Submatch . . . . . . . . . . . . . . . . . . . . 59

4.2.3. Operations on Tagged NFAs . . . . . . . . . . . . . . . . . . . . 62

Match Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Submatch Extraction . . . . . . . . . . . . . . . . . . . . . . . . 64

4.2.4. Boolean Function Representation . . . . . . . . . . . . . . . . . . 66

Match Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Submatch Extraction . . . . . . . . . . . . . . . . . . . . . . . . 69

4.2.5. Submatch-OBDD . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.2.6. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.3. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.3.1. Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Snort-2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Snort-2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Firewall-504 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.3.2. Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3.3. Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . 76

Snort-2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Snort-2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Firewall-504 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.3.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.4. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5. A New Algorithm for Patterns with Back References . . . . . . . . . 82

5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

ix

5.2. Design of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.2.1. Pattern Compilation . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.2.2. Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Frontier Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Acceptance Checking . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.3. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.4. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.4.1. Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Patho-01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Snort-46 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4.2. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.5. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6. Conclusion and Future Directions . . . . . . . . . . . . . . . . . . . . . . 95

6.1. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.1.1. NFA-OBDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.1.2. Submatch-OBDD . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.1.3. NFA-backref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.2. Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.2.1. Hardware-based NFA-OBDD and Submatch-OBDD . . . . . . . 97

6.2.2. Security in Software Defined Networking . . . . . . . . . . . . . . 97

Security Applications in SDN . . . . . . . . . . . . . . . . . . . . 97

Digital Forensics in SDN . . . . . . . . . . . . . . . . . . . . . . . 98

Configuration Validation in SDN . . . . . . . . . . . . . . . . . . 98

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

x

List of Tables

4.1. Transition table of an example tagged NFA . . . . . . . . . . . . . . . . 62

4.2. Boolean encoding of transitions in Table 4.1. . . . . . . . . . . . . . . . 67

4.3. Execution time and memory consumption for the Snort-2009 data set . 77

4.4. Execution time and memory consumption for the Snort-2012 data set . 78

4.5. Execution time and memory consumption for the Firewall data set . . . 79

5.1. Transition table of an example tagged NFA . . . . . . . . . . . . . . . . 86

xi

List of Figures

1.1. An example signature from Snort 2012. . . . . . . . . . . . . . . . . . . 2

1.2. A simplified network-based intrusion detection system. . . . . . . . . . . 3

1.3. An example NFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4. A DFA equivalent to the NFA in Figure 1.3 . . . . . . . . . . . . . . . . 5

1.5. An NFA-based pattern matching example . . . . . . . . . . . . . . . . . 8

1.6. An example of using backtracking to do match test. . . . . . . . . . . . 10

1.7. The time-space tradeoff of different pattern matching approaches. . . . . 11

1.8. An example of DFA combination . . . . . . . . . . . . . . . . . . . . . . 12

1.9. An example of DFA state blow-up . . . . . . . . . . . . . . . . . . . . . 12

1.10. The growth trend of Snort signatures . . . . . . . . . . . . . . . . . . . . 13

1.11. An example path tree traversed by the recursive backtracking agorithm. 15

2.1. An example of a Boolean formula and OBDDs . . . . . . . . . . . . . . 19

2.2. An example of Apply and Restrict operations in OBDDs . . . . . . . 20

3.1. NFA for (0|1)∗1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2. Components of our software-based implementation of NFA-OBDDs. . . 31

3.3. Statistics of the HTTP traces used in our experiments . . . . . . . . . . 33

3.4. Statistics of the FTP traces used in our experiments . . . . . . . . . . . 34

3.5. NFA-OBDD construction results. . . . . . . . . . . . . . . . . . . . . . . 36

3.6. Performance data of different implementations . . . . . . . . . . . . . . 37

3.7. Raw performance numbers for the charts shown in Figure 3.6. . . . . . . 38

3.8. Fraction of time spent performing OBDD operations. . . . . . . . . . . . 42

3.9. Impact of OBDD variable ordering on the performance of NFA-OBDDs. 44

3.10. 2-stride NFA for Figure 3.1. . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.11. The NFA in Figure 3.10 adapted for streaming. . . . . . . . . . . . . . . 48

xii

3.12. Memory versus throughput for 1-stride and 2-stride NFA-OBDDs . . . . 52

4.1. Basic elements of tagged NFAs . . . . . . . . . . . . . . . . . . . . . . . 61

4.2. The union, concatenation, and closure constructs of tagged NFAs . . . . 61

4.3. An example tagged ε-NFA . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.4. An example ε-free tagged NFA . . . . . . . . . . . . . . . . . . . . . . . 62

4.5. Example of frontier derivation in a tagged NFA . . . . . . . . . . . . . . 63

4.6. The ordered binary decision diagram of a Boolean function . . . . . . . 68

5.1. The tagged NFA constructed from “(a*)aa(a*)”. . . . . . . . . . . . . 85

5.2. The toolchain of our back reference implementation. . . . . . . . . . . . 89

5.3. Performance of different implementations for the Patho-01 data set . . . 91

5.4. Performance of different implementations for the Snort-46 pattern set . 92

xiii

1

Chapter 1

Introduction

Network security applications, e.g., network-based intrusion detection systems (NIDS),

employ pattern matching to identify data of interest. An ideal pattern matching algo-

rithm in network security applications must satisfy two requirements: time efficiency

and space efficiency. Since network security applications are often deployed over high

speed network links, time efficiency requires the time spent on processing each byte of

data to be small to keep up with the Gbps of packet processing speed. Space efficiency

requires that the representation of larger number of patterns be small to fit into the

main memory of a system. In this dissertation, we first show that existing approaches

for pattern matching suffer from a time-space tradeoff. We then present several pattern

matching algorithms that overcome the weaknesses of existing solutions. Our experi-

mental results have shown that the new algorithms outperform existing approaches by

one to three orders of magnitude while avoiding memory blow-up and resisting known

algorithmic attacks.

In this chapter, we first briefly describe how network security applications employ

pattern matching to identify data of interest in Section 1.1, followed by the description

of existing pattern matching algorithms in Section 1.2. We describe the main chal-

lenges of pattern matching in Section 1.3 and summarize the main contributions of this

dissertation to address those challenges in Section 1.4. We list the contributors to this

dissertation in Section 1.5. Finally, we describe the organization of this dissertation in

Section 1.6.

2

1.1 Pattern Matching in Network Security Applications

Modern computer networks rely on intrusion detection systems (IDS) for security. Intru-

sion detection systems can be broadly categorized as host-based IDS [92] and network-

based IDS (NIDS). NIDS can be further categorized into anomaly-based detection sys-

tems [91, 89, 63] and signature-based detection systems [93, 61, 17, 5]. Signature-based

NIDS employ patterns to describe the features of malicious data (called signatures).

Figure 1.1 shows an example signature from Snort [66, 5], a commercial signature-

based NIDS1. The “tcp $EXTERNAL NET any -> $HTTP SERVERS $HTTP PORTS” part

in the signature specifies that this signature applies to TCP packets coming from an

outside network address with any port number to an HTTP server at the HTTP ports.

The pcre part shows a pattern that is used to match the payload of a packet. In

particular, pattern

"/username=[^&\x3b\r\n]{255}/si"

will match network packets containing contents of strings in the form of “username=”

followed by 255 characters that are not ‘&’, ‘;’, or spaces.

alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS (msg:"...,

pcre:"/username=[^&\x3b\r\n]{255}/si"; metadata:service http; ...

classtype:web-application-attack; sid:2702; rev:9;)

Figure 1.1: An example signature from Snort 2012.

Figure 1.2 shows how a NIDS performs deep packet inspection by matching incoming

packets against known patterns of malicious traffic. In this figure, the pattern set

contains one toy pattern "evil". Packets that contains “evil” in the payloads will

be considered as malicious and alerts will be triggered. All other packets that do

not contain “evil” in the payloads will be considered innocent and allowed to pass the

NIDS. Patterns in real systems are often more complex, like the one shown in Figure 1.1.

1For simplicity, we use the term NIDS to refer a signature-based NIDS in our following descriptions.

3

Figure 1.2: A simplified network-based intrusion detection system.

1.2 Existing Approaches for Pattern Matching

In the past, attack patterns were keywords that could be efficiently matched using

string matching algorithms, e.g, KMP [44], Boyer-Moore [16], Wu-Manber [96], and

Aho-Corasick [7] algorithms. The increasing complexity of attacks has led the research

community to investigate more expressive pattern representations, which require the

full power of regular expressions with some extended features, such as submatch ex-

traction and back references. Existing approaches for pattern matching can be broadly

categorized as finite automaton-based and recursive backtracking based techniques.

1.2.1 Finite Automata

The majority of patterns used in network security applications are regular expressions

(can be described by regular languages). Finite automata are natural representations

for regular expressions. It is known that regular expressions, deterministic finite au-

tomata (DFAs), and non-deterministic finite automata (NFAs) are equivalent in terms

of expressive power. Therefore, regular expression matching can be performed by op-

erating the corresponding NFAs or DFAs. Tools such as GNU grep [4], Awk [3], and

Tcl [59] implement regular expression matching using NFA and DFA-based approaches.

4

1 2

a

a a

3

Figure 1.3: An NFA constructed from regular expression “a*aa”.

Non-deterministic Finite Automata

An NFA can be represented using a 5-tuple: (Q, Σ, ∆, q0, Fin), where Q is a finite set

of states, Σ is a finite set of input symbols (the alphabet), ∆: Q× (Σ ∪ {ε})→ 2Q is a

transition function, q0 ∈ Q is a start state, and Fin ⊆ Q is a set of accepting (or final)

states. The transition function ∆(s, i) = T describes the set of all states t ∈ T such

that there is a transition labeled i from s to t. Note that ∆ can also be expressed as a

relation δ: Q× Σ×Q, so that (s, i, t) ∈ δ for all t ∈ T such that ∆(s, i) = T .

Given a regular expression, we can use the Thompson’s algorithm [84] to construct

an ε-NFA that recognizes the same language as the given regular expression. An ε-

NFA can be converted to an ε-free-NFA using the ε-closure algorithm. For example,

Figure 1.3 shows an ε-free-NFA constructed from regular expression “a*aa”, where the

start state is numbered by 1, and the final state is numbered by 3. For simplicity, we

use the term NFA to refer an ε-free-NFA in our following descriptions.

Deterministic Finite Automata

An NFA (Q, Σ, ∆, q0, Fin) can be converted to a DFA that recognizes the same

language using the subset construction algorithm [38]. The corresponding DFA has

states that are subsets of Q. The initial state of the DFA is {q0}. For a state S in

the DFA, the transition function is defined as T (S, i) = ∪{T (q, i)|q ∈ S}, where i is an

input symbol. In other words, the transition function maps a state S (a subset of Q)

and input symbol i to a set of all states that can be reached by an i-transition from a

state in S. A state S of a DFA is an accept state if and only if at least one member

5

{1}

a

a a

{1,2} {1,2,3}

Figure 1.4: The DFA recognizing the same language as the NFA in Figure 1.3.

of S is an accept state of the corresponding NFA. Figure 1.4 shows a DFA constructed

from the example NFA in Figure 1.3 using the subset construction algorithm. The start

state of the DFA is {1}, and the accept state is {1, 2, 3}.

1.2.2 Pattern Matching Algorithms

DFA-based Matching

An important feature of DFAs is that at any state, each possible input symbol leads to

at most one new state. A DFA-based matching algorithm is described by Algorithm 1.

The input of the algorithm is an input string str and a DFA with start state S0 and final

set of states F . At line 1, the algorithm assigns the start state S0 as current state. The

loop from line 2 to line 8 performs match test. For the ith input symbol, the transition

function T returns the next state by looking up the transition relation of the DFA. If

the returned next state is null, then the algorithm returns false which means that the

input string does not match the DFA. If an input symbol is the last character of the

input string, line 6 checks whether the next state is one of the states in the final state

set. If so, the algorithm returns true which means that the input string is matched by

the DFA. Line 8 renames the next state as the current state and the loop continues.

Finally, the algorithm returns false if the DFA is not in a final state after consuming

the last symbol of the input string.

For the example DFA in Figure 1.4 with input string “aaaa”, the match test can

be performed in the following way according to Algorithm 1. Start from start state

{1}, with the first input symbol ‘a’, the next state of the DFA is {1, 2}. Using {1, 2}

6

Algorithm: DFA-MATCH-TESTInput : An input string str to be matched, and a DFA with start state

S0 and final state set F .Output : true or falsecurrent state = S0;1

for i=1 to strlen(str) do2

next state = T (current state, str[i]);3

if next state is null then4

return false;5

if i==strlen(str) and next state ∈ F then6

return true;7

current state = next state;8

return false;9

Algorithm 1: DFA-based matching algorithm.

as the current state, with the second input symbol ‘a’, the next state is {1, 2, 3}. Using

{1, 2, 3} as current state, with the third input symbol ‘a’, the next state of DFA is still

{1, 2, 3}. After consuming the fourth input symbol ‘a’, the DFA stays at state {1, 2, 3},

which is an accept state. Therefore, the DFA is matched by the input string “aaaa”.

For an input string of length n, the running time of Algorithm 1 is O(n) since for

each input symbol, at most one transition lookup is performed. Thus, a DFA-based

matching algorithm is time efficient.

Thompson’s NFA-based Matching

Different than a DFA, at any state an NFA can have multiple choices for the next state

after reading an input symbol. An NFA matches an input string if there is a way that it

can read the input string and transit to a final state at the end of the input string. An

NFA-based match test is described in Algorithm 2. Line 1 initializes the start states as

the current set of states. Starting from the start states with the first input symbol, the

loop between lines 2 to 8 processes the transitions for each input symbol. Given a set

of current states, the next set of states (also called frontiers) is the union of states that

are reachable from any state s in the current states, as is shown in the loop from lines 4

to 5. After consuming the last input symbol, line 6 checks whether there exists a state

in the current states that is also a final state. If so, the algorithm returns true, which

7

Algorithm: Thompson’s algorithmInput : An input string str to be matched, and an NFA (Q, Σ, ∆,

q0, Fin).Output : true or falsecurrent states = {q0};1

for i=1 to strlen(str) do2

next states = φ;3

foreach s ∈ current states do4

next states = next states ∪ δ(s, str[i]);5

if (i==strlen(str)) and (next states ∩ Fin 6= φ) then6

return true;7

current states = next states;8

return false;9

Algorithm 2: Thompson’s algorithm for NFA-based match test.

means that the NFA is matched by the input string. For an NFA that has m states,

the size of frontier set (current states) is O(m). It can be calculated that the running

time of Algorithm 2 is O(m× n) for a string of length n. Comparing with DFA-based

matching, NFA-based matching is inefficient.

For the example NFA in Figure 1.3 with input string “aaaa”, the NFA-based match-

ing can be performed in the following way according to Algorithm 2. Start from start

state set {1} with the first input symbol ‘a’, the next set of states is {1, 2}. Renaming

{1, 2} as current set of states, with the second input symbol ‘a’, the next frontier set

can be obtained as {1, 2, 3}. Using the same method to process the third and fourth

input symbol. At the end, the frontier set is {1, 2, 3}. Since state 3 is an accept state,

the NFA is matched by the input string “aaaa”. Figure 1.5 demonstrates the process

of matching “aaaa” with the example NFA in Figure 1.3. It can be observed that

processing each input symbol involves multiple transition lookups.

Recursive Backtracking-based Matching

Another way to simulate an NFA is using recursive backtracking, which is shown in

Algorithm 3 and Algorithm 4. Line 1 of Algorithm 3 initializes a Boolean variable

matched to false. Line 2 calls BT-MATCH to perform a recursive backtracking-based

match test on the input string str. BT-MATCH may change the value of the matched

8

Figure 1.5: Matching input string aaaa with the example NFA in Figure 1.3 usingThompson’s algorithm.

Algorithm: BT-MATCH-TESTInput : an input string str and an NFA(Q, Σ, ∆, q0, Fin).Output : true of falsematched ← false;1

BT-MATCH(str, q0, 0);2

return matched;3

Algorithm 3: A recursive backtracking-based matching algo-rithm.

variable during its execution. Algorithm 4 is the core part to perform the match test. It

operates in a depth-first-search style. For a current state s with the ith input symbol,

the algorithm processes all states in δ(s, str[i]) in a depth-first-search way (lines 2 to

6). Lines 3 to 5 do an acceptance check if the current symbol is the last character

of str. If a state t ∈ δ(s, str[i]) is an accept state, then a true value is assigned to

matched and the procedure terminates; otherwise it recursively calls BT-MATCH by

passing t and i+1 as new parameters. A recursive backtracking algorithm may have to

scan an input string multiple times before it finds a match. Tools like PCRE and the

regular expression libraries in some high level languages such as Java, Perl, and Python

implemented pattern matching using recursive backtracking.

For the example NFA in Figure 1.3 with input string “aaaa”, Figure 1.6 shows

how a backtracking approach finds a match after trying three paths. The backtracking

algorithm first tries a path 1 -> 1 -> 1 -> 1 -> 1 and fails. It then backtracks one

9

Algorithm: BT-MATCH(str, s, i)Input : str is a string to be matched, s is a current state s,

and i is the offset of an input symbolif i < strlen(str) then1

foreach t ∈ δ(s, str[i]) do2

if (i == strlen(str)− 1) and (t ∈ Fin) then3

matched ← true;4

return;5

BT-MATCH(str, t, i+ 1);6

7

Algorithm 4: The body of recursive backtracking-basedmatching algorithm.

step and tries path 1 -> 1 -> 1 -> 1 -> 2 and fails again. After that, it backtracks

two steps and tries 1 -> 1 -> 1 -> 2 -> 3 and succeeds (state 3 is an accept state).

In this example, the last input character was scanned three times, and the last second

character was scanned twice before the algorithm found that the input string matches

the pattern.

1.3 Challenges in Pattern Matching

Ideally, pattern matching in network security applications should satisfy two require-

ments: time efficiency and space efficiency. Since network applications are often de-

ployed over high speed network links, time efficiency requires the time spent on process-

ing each byte of data to be small to keep up with the Gbps of packet processing speed.

Space efficiency requires that the representation of larger number of patterns be small

to fit into the main memory of a system. In this section, we will show that pattern

matching in practice suffer from a time-space tradeoff. DFA-based approaches are fast,

but suffer from state blow-up due to the subset construction mechanism. NFA-based

approaches are space efficient, but are slow in operation because an NFA can be simul-

taneously in multiple states at any instant. Recursive backtracking-based approaches

are fast in general, but are vulnerable to algorithmic complexity attacks [75]. The per-

formance of a NIDS implemented using recursive backtracking could be slowed down by

several orders of magnitude under algorithmic complexity attacks [75]. The time-space

10

a

a

a

a

a

1

2 3

1

1

1

2

a

1

a

Figure 1.6: An example of using backtracking to do match test.

tradeoff of different approaches can be shown by Figure 1.7, where the x-axis denotes

space and the y-axis denotes time spent on processing a byte of data. An ideal solution

should be close to the origin of the coordinate system. As we can see, none of the ex-

isting approaches can be considered as ideal. The goal of this dissertation is to propose

new algorithms that are close to the ideal solution.

1.3.1 State Blow-up of DFAs

From Section 1.2.2, we know that for an input string of length n, the running time

of a DFA-based matching algorithm is O(n). Thus, DFA-based algorithms are time

efficient, making DFAs attractive candidates for pattern matching. However, DFA-

based algorithms have a potential disadvantage: state blow-up. Real-world pattern

matching often needs to match an input stream against a set of patterns. A DFA-based

solution then requires constructing one DFA that is combined from the individual DFAs

of all patterns in the pattern set. In this way, the combined DFA recognizes a language

that is the union of languages described by the individual patterns in the pattern set.

According to the subset construction mechanism, combining two DFAs can result in

11

Figure 1.7: The time-space tradeoff of different pattern matching approaches.

a multiplicative increase in number of sates. That’s said: if the sizes of two DFAs

are m and n, then the size of the combined DFA is O(m × n). Smith et al. [77]

formally characterize this blowup using the notion of ambiguity. Figure 1.8 shows an

example of DFA combination. The left side shows two DFAs of patterns “.*ab.*cd”

and “.*ef.*gh”. The right side shows a DFA combined from the two DFAs on the

left. The combined DFA recognizes any input string that matches either “.*ab.*cd”

or “.*ef.*gh”. The individual DFAs have five states each, while the combined DFA

has sixteen states. The problem will be more pronounced if hundreds of patterns are

combined into one DFA. As a result, DFA representations for large sets of regular

expressions often consume several gigabytes of memory, and do not fit within the main

memory of most NIDS.

State blow-up can also happen for a single pattern that has certain constructs. For

example, if a pattern contains both wildcard characters “.*” and a quantifier “{n}”

at different locations, the size of the resulting DFA can be exponential to the value of

the quantifier. Figure 1.9 shows the DFA of an example pattern “.*1[0|1]{3}”. This

pattern matches any binary strings where the last fourth symbol is 1. The size of the

DFA is 16. In fact, for such type of patterns with quantifier n, the size of corresponding

DFAs is O(2n) [38]. In our experiments, we have encountered patterns from Snort rule

12

Figure 1.8: An example of DFA combination resulting multiplicative increase in numberof states (Picture courtesy: [76]).

set [78] where the DFA of a single pattern consumes more than 2.5GB of memory.

Figure 1.9: The DFA of pattern “.*1[0|1]{3}”. The size of this DFA is exponentialto the value of quantifier 3.

1.3.2 Growth of Pattern Sets

The increasing diversity of network attacks has led to a quick growth in the number

of attack signatures used by NIDS. A common challenge faced by NIDS is how to

adapt to the quick growth of signature databases. Figure 1.10 shows the trend of

growth of the Snort rule set from 2005 to 2012. It can be observed that the number

13

of signatures increased by around seven times in the past eight years. This upward

trend will likely accelerate in the future as NIDS vendors begin to employ automated

and semi-automated methods for signature generation [42, 58, 19, 73]. Space-efficiency

mandates that the size of the representation should grow proportionally (e.g., linearly)

with the number of attack signatures.

N

um

ber

of

sig

nat

ure

s

Year

0

5000

10000

15000

20000

25000

30000

2005 2006 2007 2008 2009 2010 2011 2012

Figure 1.10: The number of signatures in Snort increases by seven times in eight years.

1.3.3 Slow Operation of NFAs

Different than DFAs, the combining of NFAs only leads to an additive increase in

number of states [38]. For example, to combine two NFAs, we only need to create a

new start state and a new accept state, and then add two ε-transitions connecting the

new start state with the old start states and, two ε-transitions connecting the old accept

states with the new accept state. However, an NFA-based matching algorithm is often

time inefficient. For an NFA with m states, the time complexity to match an input

string of length n is O(m× n) (see the description in Section 1.2.2). The main reason

for this time inefficiency is due to the fact that each frontier update (getting a next

set of states for an input symbol) can require O(m) transition lookups. If we can find

an approach to reduce the cost of frontier updates, the time efficiency of NFA-based

matching can be improved. In this dissertation, we will propose techniques to achieve

that.

14

1.3.4 Algorithmic Complexity Attacks

As described in Section 1.2.2, recursive backtracking has been widely adopted for pat-

tern matching implementation. There are two main reasons behind this. The first rea-

son is that recursive backtracking is fast in general and easy to implement. The second

reason is that patterns used in practice often contain features that cannot be described

by regular languages. One of such important features is back reference. Recursive

backtracking is the de facto implementation of back references. However, a recursive

backtracking matching algorithm can be very slow in certain cases, as is shown by an

example below.

Figure 1.11 shows the process of using recursive backtracking algorithm to match

pattern “host.*com.*uuid=.*wv=.*cargo” with the following string which has 45

characters:

"hostcomhostcomhostcomuuid=uuid=uuid=wv=wv=wv="

We denote the five parts separated by “.*” in the pattern by P1, P2, P3, P4, and

P5 respectively, i.e., P1=“host”, P2=“com”, etc. A number on an edge between two

nodes in the figure denotes an offset where a subexpression Pi(i = 1 . . . 5) is matched

in the input string. For example, the leftmost edge between P1 and P2 is labeled by 3,

which means that “host” is matched by the input string at offset 3. The above pattern

is matched by an input string if and only if P1, P2, P3, P4, and P5 are sequentially

matched by the input string. It can be observed that a backtracking approach needs

to try 45 paths for the input string before it can claim that the example pattern is

not matched by the example input string. In general, for a pattern that has k parts

separated by wildcard characters “.*”, the running time of a backtracking algorithm

can be close to O(nk) [75], where n is the length of the input string. Such a behavior

that triggers a backtracking algorithm to exhaustively try all execution paths for input

strings is called algorithmic complexity attack. Researchers have demonstrated that the

throughput of a NIDS employing recursive backtracking for pattern matching can be

slowed down for several orders of magnitude [75]. Therefore, recursive backtracking is

not a good choice for pattern matching.

15

P1

P2 P2 P2

P3

P4

P5 P5 P5

P4

P5 P5 P5

P4

P5 P5 P5

P3

P4

P5 P5 P5

P4

P5 P5 P5

P4

P5 P5 P5

P3

P4

P5 P5 P5

P4

P5 P5 P5

P4

P5 P5 P5

3 10

17

6 13

20

25 30

35

38 41 44 38 41 44 38 41 44

25 30

35

38 41 44 38 41 44 38 41 44

25 30

35

38 41 44 38 41 44 38 41 44

Figure 1.11: An example path tree traversed by the recursive backtracking agorithm.

1.4 Summary of Contributions

This dissertation focuses on addressing the main challenges described in Section 1.3.

The problem statement of this dissertation is: Pattern matching algorithms employed

by network security applications typically demonstrate a time-space tradeoff: NFA-

based matching is slow but memory-efficient, while DFA-based matching is fast but

memory-intensive. Many practical implementations avoid the tradeoff by eliding the

use of NFAs and DFAs, but are vulnerable to algorithmic complexity attacks when

scanning malicious network traffic.

The thesis statement of this dissertation is:

Using ordered binary decision diagrams, it is possible to design pattern match-

ing algorithms that are up to three orders of magnitude faster than tradi-

tional NFA-based pattern matching algorithms, retain the memory-efficiency

of NFAs, and are immune to known algorithmic complexity attacks.

In this dissertation, we propose several new pattern matching algorithms for network

security applications. In particular, we make three contributions.

First, we investigate patterns that can be desribed by regular languages, i.e., regular

expressions. We propose NFA-OBDD, a time and space efficient data structure for reg-

ular expression matching. We evaluate the performance of NFA-OBDD using real-world

patterns and network traffic traces. Our experimental results show that NFA-OBDD

has three benefits: (1) It is three orders of magnitude faster than a traditional NFA

16

implementation, while retaining the space efficiency of NFAs. (2) NFA-OBDD is faster

or at least competitive to PCRE, a widely used pattern matching tool implemented

using recursive backtracking. (3) The time efficiency of NFA-OBDD is comparable to

MDFA, a variant of DFA-based pattern matching approach, but consuming much less

memory than MDFA.

Second, we investigate a more general case, regular expressions extended with sub-

match extraction, which is an important feature in real-world patterns used by network

security applications. We propose an extension of NFA-OBDD to model submatch ex-

traction. Our experimental results show that our submatch extraction approach (called

Submatch-OBDD) is one order of magnitude faster than PCRE and Google’s RE2 (a

pattern matching tool that supports regular expression matching and submatch extrac-

tion).

Third, we study an even more general case, patterns containing back references,

which are non-regular languages. We propose NFA-Backref, an efficient pattern match-

ing approach for patterns with back references. Our exprimental results show that

NFA-Backref resists known algorithmic complexity attacks, and outperforms PCRE by

at least three orders of magnitude for certain types of patterns. For benign patterns,

NFA-Backref is one order of magnitude slower than PCRE.

1.5 Contributors to the Dissertation

This section lists the co-authors of the papers from which the materials are used in this

dissertation. The NFA-OBDD model in Chapter 3 is a collaborated work with my advi-

sor Professor Vinod Ganapathy, my colleagues Rezwana Karim, and Randy Smith from

University of Wisconsin-Madison. Vinod Ganapathy motivated the NFA-OBDD model

and directed the project. Rezwana Karim contributed by implementing the one-gram

NFA-OBDD construction and execution. Randy Smith contributed by working on the

regular expression parsing. The Submatch-OBDD presented in Chapter 4 is collabo-

rated with Vinod Ganapathy, and some researchers from HP Laboratories, Pratyusa

17

Manadhata, William Horne, and Prasad Rao. Vinod Ganapathy contributed by validat-

ing the correctness of the Submatch-OBDD model. Pratyusa Manadhata contributed

by discussing the correctness of the submatch extraction algorithm and partially de-

signing the experimental evaluation. William Horne contributed by proposing a tagging

approach for capturing groups. Prasad Rao contributed by writing tools to generate

synthetic traces. The NFA-Backref algorithm presented in Chapter 5 is collaborated

with Vinod Ganapathy and Pratyusa Manadhata. Vinod Ganapathy contributed by

brainstorming and discussing ideas for pattern matching with back references. Pratyusa

Manadhata contributed by validating the correctness of the new back reference algo-

rithm.

1.6 Dissertation Organization

This dissertation is organized as follows. Chapter 2 describes the background of or-

dered binary decision diagrams (OBDDs). Chapter 3 presents NFA-OBDD, a time and

space efficient pattern matching approach for regular expressions. Chapter 4 presents

Submatch-OBDD, an extension of NFA-OBDD to model submatch extraction. Chap-

ter 5 presents an efficient algorithm for patterns containing back references. Finally,

Chapter 6 concludes the thesis and discusses directions of future study.

18

Chapter 2

Background: Ordered Binary Decision Diagrams

An ordered binary decision diagram (OBDD) is a data structure that can represent

arbitrary Boolean formulae. OBDDs transform Boolean function manipulation into

efficient graph transformations, and have found wide use in a number of application

domains. For example, OBDDs are used extensively by model checkers to improve

the efficiency of state-space exploration algorithms [21]. OBDDs and their variants

have also been used in the analysis and design of intrusion detection systems and

firewalls [23, 37, 104, 103, 34, 33].

2.1 Definition of OBDD

Formally, an OBDD represents a Boolean function f(x1, x2, . . . , xn) as a rooted, di-

rected acyclic graph (DAG) that has two kinds of nodes: non-terminals and up to two

terminals, which are labeled 0 and 1. Terminal nodes do not have outgoing edges. Each

non-terminal node v is associated with a label var(v) ∈ {x1, x2, . . ., xn}, and has two

successors low(v) and high(v). The edges to these successors are labeled 0 and 1,

respectively. An OBDD is ordered in the sense that node labels are associated with a

total order <. Node labels along all paths in the OBDD from the root to the terminal

nodes follow this total order. An OBDD must also satisfy two additional properties:

• there are no two non-terminal nodes u and v such that var(u) = var(v), low(u)

= low(v), and high(u) = high(v); and

• there is no non-terminal u with low(u) = high(u).

In his seminal article, Bryant [20] introduced algorithms to construct OBDDs for

Boolean formulae and showed that for a given total order of the variables of a Boolean

19

x i y f(x, i, y)

0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 11 0 1 01 1 0 01 1 1 1

(a) A Boolean function f(x, i, y).

x

ii

yy

1 0

(b) OBDD(f) with

x < i < y.

i

x

y y

10

(c) OBDD(f) with

i < x < y.

i

0 1

(d) OBDD(I(i)), the

identity function.

Figure 2.1: An example of a Boolean formula and OBDDs with different variable or-derings. Solid edges are labeled 1, dotted edges are labeled 0.

formula, the OBDD representation of that formula is canonical, i.e., for a given ordering,

two OBDDs for a Boolean formula are isomorphic. Figure 2.1(b) depicts an example of

an OBDD for the Boolean formula f(x, i, y) shown in Figure 2.1(a). In this figure, the

variable ordering is x < i < y. To evaluate the Boolean formula for a given variable

assignment, say {x ← 1, i ← 0, y ← 1}, it suffices to traverse the appropriately

labeled edges from the root to the terminal nodes; in this case f(1, 0, 1) evaluates to 0.

Figure 2.1(c) depicts the OBDD for f with the variable ordering i < x < y. Although not

evident from this example, the size of OBDDs is sensitive to the total order imposed

on the Boolean variables; it is NP-hard to choose a total order that yields the most

compact OBDD for a Boolean function [20].

An OBDD representation of a Boolean formula offers several advantages. First,

OBDDs are often more compact than other representations of Boolean formulae, such

as decision trees, conjunctive normal form (CNF) and disjunctive normal form (DNF).

Intuitively, this is because an OBDD captures and eliminates redundant nodes in the

decision tree representation of a Boolean formula. Second, OBDDs allow properties

of Boolean functions to be checked efficiently. For example, to determine whether

a Boolean function is satisfiable (or unsatisfiable), it suffices to check whether the

terminal node labeled 1 (respectively, 0) is reachable from the root node. Because

OBDD construction and manipulation algorithms eliminate nodes that are unreachable

from the root, checking (un)satisfiability is a constant-time operation.

20

x

ii

0 1

y

(a) Apply(∧, OBDD(f), OBDD(I(i))).

x

y

10

(b) Restrict(OBDD(f), i ← 1).

Figure 2.2: Result of the Apply and Restrict operations on the OBDD in Fig-ure 2.1(b).

2.2 Operations in OBDDs

OBDDs allow Boolean functions to be manipulated efficiently. Bryant [20] describes two

operations, Apply and Restrict, which allow OBDDs to be combined and modified

with a number of Boolean operators. These two operations are implemented as a

series of graph transformations and reductions to the input OBDDs, and have efficient

implementations; their time complexity is polynomial in the size of the input OBDDs.

We describe Apply and Restrict informally below, and refer the reader to Bryant’s

article [20] for details of these algorithms.

Apply allows binary Boolean operators, such as ∧ and ∨, to be applied to a pair of

OBDDs. The two input OBDDs, OBDD(f) and OBDD(g), must have the same variable

ordering. Apply(<op>, OBDD(f), OBDD(g)) computes OBDD(f <op> g), which

has the same variable ordering as the input OBDDs. Figure 2.2(a) presents the OBDD

obtained by combining the OBDD in Figure 2.1(b) with OBDD(I(i)) (Figure 2.1(d)),

where I is the identity function. Intuitively, Apply is implemented as a simple recursive

algorithm that processes the DAG representing the OBDD in layers, with each recursive

step processing a subgraph of the previous step, and finally reducing the resulting DAG

so that it satisfies the properties of an OBDD (e.g., deleting unreachable nodes, and

suitably merging nodes).

The Restrict operation is unary, and produces as output an OBDD in which

the values of some of the variables of the input OBDD have been fixed to a certain

value. That is, Restrict(OBDD(f), x←k) = OBDD(f |(x←k)), where f |(x←k) denotes

that x is assigned the value k in f . In this case, the output OBDD does not have

21

any nodes with the label x. Figure 2.2(b) shows the OBDD obtained as the output

of Restrict(OBDD(f), i ← 1), where OBDD(f) is the OBDD of Figure 2.1(b). In-

tuitively, the Restrict operation is implemented by eliminating the nodes labeled i,

suitably redirecting edges from i’s predecessors to point to i’s successors and removing

unreachable nodes.

Finally, Apply and Restrict can be used to implement existential quantification,

which is used in a key way in the operation of NFA-OBDDs, as described in Section 3.2.

In particular, ∃xi.f(x1, . . . , xn) = f(x1,. . . , xn)|(xi ← 0) ∨ f(x1,. . . , xn)|(xi ← 1). There-

fore, we have: OBDD(∃ xi.f(x1, . . . , xn)) = Apply(∨, Restrict(OBDD(f), xi ← 1),

Restrict(OBDD(f), xi ← 0)). Note that OBDD(∃ xi.f(x1, . . . , xn)) will not have a

node labeled xi.

2.3 Representing Relations and Sets

OBDDs can be used to represent relations of arbitrary arity. If R is an n-ary rela-

tion over the domain {0, 1}, then we define its characteristic function fR as follows:

fR(x1, . . . , xn) = 1 if and only if R(x1, . . . , xn). For example, the characteristic func-

tion of the 3-ary relation R = {(1, 0, 1), (1, 1, 0)} is fR(x1, x2, x3) = (x1 ∧ x2 ∧ x3) ∨

(x1∧x2∧ x3). fR is a Boolean function and can therefore be expressed using an OBDD.

An n-ary relation Q over an arbitrary domain D can be similarly expressed using

OBDDs by bit-blasting each of its elements. That is, if the domain D has m elements,

we map each of its elements uniquely to bit-strings containing dlgme bits (call this

mapping φ). We then define a new relation R(φ(x1), . . . , φ(xn)) = Q(x1, . . . , xn). R

is a n × dlgme-ary relation over {0, 1}, and can be converted into an OBDD using its

characteristic function.

A set of elements over an arbitrary domain D can also be expressed as an OBDD

because sets are unary relations, i.e., if S is a set of elements over a domain D, then

we can define a relation RS such that RS(s) = 1 if and only if s ∈ S. Operations

on sets can then be expressed as Boolean operations and performed on the OBDDs

representing these sets. For example, S ⊆ T can be implemented as OBDD(S) −→

22

OBDD(T ) (logical implication), while isEmpty(S∩T ) is equivalent to checking whether

OBDD(S)∧OBDD(T ) is satisfiable. The conversion of relations and sets into OBDDs is

used in a key way in the construction and operation of NFA-OBDDs, which we describe

next.

23

Chapter 3

Improving NFA-based Pattern Matching using OBDDs

3.1 Introduction

Deep packet inspection allows network intrusion detection systems (NIDS) to accu-

rately identify malicious traffic by matching the contents of network packets against

attack signatures. In the past, attack signatures were keywords that could efficiently

be matched using string matching algorithms [7, 16, 44, 96, 55, 9, 10, 86, 90, 30]. How-

ever, the increasing complexity of network attacks has led the research community to

investigate richer signature representations (e.g., [80, 101, 88]), many of which require

the full power of regular expressions. Because NIDS are often deployed over high-speed

network links, algorithms to match such rich signatures must also be efficient enough to

provide high-throughput intrusion detection on large volumes of network traffic. This

problem has spurred much recent research, and in particular has led to the investiga-

tion of new representations of regular expressions that allow for efficient inspection of

network traffic (e.g., [102, 46, 76, 30, 13, 11]).

As we described in Chapter 1, to be useful for deep packet inspection in a NIDS,

any representation of regular expressions must satisfy two key requirements: time-

efficiency and space-efficiency. Finite automata are a natural representation for regular

expressions, but offer a tradeoff between time- and space-efficiency. This time/space

tradeoff has motivated much recent research, primarily with a focus on improving the

space-efficiency of DFAs. These include heuristics to compress DFA transition tables

(e.g., [46, 13]), techniques to combine regular expressions into multiple DFAs [102], and

variable extended finite automata (XFAs) [76], which offer compact DFA representa-

tions and guarantee an additive increase in states when signatures are combined, pro-

vided that the regular expressions satisfy certain conditions. These techniques trade

24

time for space, and though the resulting representations fit in main memory, their

matching algorithms are slower than those for traditional DFAs.

In this chapter, we take an alternative approach and instead focus on improving

the time-efficiency of NFAs. NFAs are not currently in common use for deep packet

inspection, and understandably so—their performance can be several orders of magni-

tude slower than DFAs. Nevertheless, NFAs offer a number of advantages over DFAs,

and we believe that further research on improving their time-efficiency can make them

a viable alternative to DFAs. Our position is supported in part by these observations:

• NFAs are more compact than DFAs. Determinizing an NFA involves a subset

construction algorithm, which can result in a DFA with exponentially more states

than an equivalent NFA [38].

• NFA combination is space-efficient. As pointed out in Chapter 1, combining

two NFAs only results in an additive increase in number of states. This feature

of NFAs is particularly important, given that the diversity of network attacks has

pushed NIDS vendors to deploy an ever increasing number of signatures.

• NFAs can readily be parallelized. An NFA may contain multiple outgoing

transitions for a single input symbol from each state, all of which must be followed

when that input symbol is encountered. An NFA simulator can easily parallelize

these operations as shown in prior work [71, 25].

Motivated by these advantages, we develop a new approach to improve the time-

efficiency of NFAs. Our core insight is that a technique to efficiently apply an NFA’s

transition relation to a set of states can greatly improve the time-efficiency of NFAs.

Such a technique would apply the transition relation to all states in the frontier in a

single operation to produce a new frontier. We develop an approach that uses ordered

binary decision diagrams [20] (OBDDs) to implement such a technique. Our use of

OBDDs to process NFA frontiers is inspired by symbolic model checking, where the use

of OBDDs allows the verification of systems that contain an astronomical number of

states [21].

25

To evaluate the feasibility of our approach, we constructed NFAs in software using

HTTP and FTP signatures from Snort. We operated these NFAs using OBDDs and

evaluated their time-efficiency and space-efficiency using traces of real HTTP and FTP

traffic. Our experiments showed that NFAs that use OBDDs (NFA-OBDDs) outperform

traditional NFAs by approximately three orders of magnitude. Our experiments also

showed that NFA-OBDDs retain the space-efficiency of NFAs. In contrast, our machine

ran out of memory when trying to construct DFAs (or their variants) from our signature

sets.

In addition to improving the time-efficiency of NFAs, our approach has a number of

advantages. First, construction of NFA-OBDDs from regular expressions is fully auto-

mated and does not change signature semantics. In contrast, prior work on improved

signature representations has required manual analysis of regular expressions (e.g., to

identify and eliminate ambiguity [77]) or requires the semantics of signatures to be

modified (e.g., [102]). Second, it uniformly handles all regular expressions. Prior tech-

niques, especially those that convert regular expressions into DFAs (or variants), often

require manual intervention when regular expressions have certain kinds of constructs

(e.g., counters; see [76, 102]). Last, NFA-OBDDs may be amenable to a hardware im-

plementation. Both NFAs (e.g., [71, 25, 32]) and OBDDs [74, 104] have individually

been implemented in hardware. It may be possible to combine ideas from prior work

to construct NFA-OBDDs in hardware.

Our main contributions in this chapter are as follows:

• Design of NFA-OBDDs. We develop a novel technique that uses OBDDs to

improve the time-efficiency of NFAs (Section 3.2). We also describe how NFA-

OBDDs can be used to improve the time and space-efficiency of NFA-based multi-

byte matching (Section 3.5).

• Comprehensive evaluation using Snort signatures. We evaluated NFA-

OBDDs using Snort’s HTTP and FTP signature sets and observed a speedup

of about three orders of magnitude over traditional NFAs. We also compared

the performance of NFA-OBDDs against a variety of automata implementations,

26

including the PCRE package and a variant of DFAs (Section 3.4).

The main benefit of NFA-OBDDs is in improving the performance (i.e., time and

space-efficiency) of deep packet inspection by NIDS, independent of its effectiveness

at detecting attacks. We acknowledge that matching network traffic against regular

expressions is no longer sufficient to detect a large fraction of attacks, and that addi-

tional security mechanisms and advanced forms of signatures (e.g., vulnerability signa-

tures [88, 19]) are necessary. Nevertheless, real deployments use layered defenses, and

NIDS will remain a cornerstone of network security for the foreseeable future. Advanced

signature matching techniques also employ regular expression matching (e.g., see [69])

and we expect that NFA-OBDDs will benefit them as well.

The rest of this chapter is organized as follows: Section 3.2 describes the construction

and operation of NFA-OBDDs; Section 3.3 describes the experimental setup and data

sets used in our evaluation, while Section 3.4 compares the performance of NFA-OBDDs

against other techniques to match regular expressions. Section 3.5 extends NFA-OBDDs

to multi-stride automata and presents experimental evaluation of multi-stride NFA-

OBDDs. We discuss related work in Section 3.6 and conclude in Section 3.7.

3.2 Representing and Operating NFAs and NFA-OBDDs

Same as in Chapter 1, we represent an NFA using a 5-tuple: (Q, Σ, ∆, q0, Fin),

where Q is a finite set of states, Σ is a finite set of input symbols (the alphabet), ∆:

Q × (Σ ∪ {ε}) → 2Q is a transition function, q0 ∈ Q is a start state, and Fin ⊆ Q is a

set of accepting (or final) states. The transition function ∆(s, i) = T describes the set

of all states t ∈ T such that there is a transition labeled i from s to t. Note that ∆ can

also be expressed as a relation δ: Q×Σ×Q, so that (s, i, t) ∈ δ for all t ∈ T such that

∆(s, i) = T .

An NFA may have multiple outgoing transitions with the same input symbol from

each state. Hence, it maintains a frontier F of states that it can be in at each step

during execution. The frontier is initially the singleton set {q0} but may include any

subset of Q during the operation of the NFA. For each symbol in the input string, the

27

NFA must process all of the states in F and find a new set of states by applying the

transition relation.

While non-determinism leads to frontiers of size O(|Q|) in NFAs, it also makes them

space-efficient in two ways. First, NFAs for certain regular expressions are exponentially

smaller than the corresponding DFAs. For example, an NFA for (0|1)∗1(0|1)n has

O(n) states, while the corresponding DFA has O(2n) states [38]. Second, and perhaps

more significantly from the perspective of NIDS, NFAs can be combined space-efficiently

while DFAs often cannot. To combine a pair of NFAs, NFA1 and NFA2, it suffices to

create a new state qnew, add ε transitions from qnew to the start states of NFA1 and

NFA2, and designate qnew to be the start state of the combined NFA. This leads to an

NFA with O(|Q1| + |Q2|) states. In contrast, combining two DFAs, DFA1 and DFA2,

can sometime result in a multiplicative increase in the number of states because the

combined DFA must have a state corresponding to s × t for each pair of states s and

t in DFA1 and DFA2, respectively. The number of states in the DFA can possibly be

reduced using minimization, but this does not always help.

3.2.1 NFA Operation using Boolean Function Manipulation

We now describe how the process of applying an NFA’s transition relation to a frontier of

states can be expressed as a sequence of Boolean function manipulations. NFA-OBDDs

implement Boolean functions and operate on them using OBDDs. For the discussion

below and in the rest of this chapter, we assume NFAs in which ε transitions have been

eliminated (using standard techniques [38]). This is mainly for ease of exposition; NFAs

with ε transitions can also be expressed using NFA-OBDDs. Note that ε elimination

may increase the total number of transitions in the NFA, but does not increase the

number of states.

We define four Boolean functions for an NFA (Q, Σ, δ, q0, Fin). These functions use

three vectors of Boolean variables: ~x, ~y, and ~i. The vectors ~x and ~y are used to denote

states in Q, and therefore contain dlg |Q|e variables each. The vector~i denotes symbols

in Σ, and contains dlg |Σ|e variables. As an example, for the NFA in Figure 3.1, these

vectors contain one Boolean variable each; we denote them as x, y, and i.

28

Figure 3.1: NFA for (0|1)∗1.

• T (~x, ~i, ~y) denotes the NFA’s transition relation δ. Recall that δ is a set of triples

(s, i, t), such that there is a transition labeled i from state s to state t. It can

therefore be represented as a Boolean function as described in Section 2.3. For

example, consider the NFA in Figure 3.1. Using 0 to denote state A and 1 to

denote state B, T (x, i, y) is the function shown in Figure 2.1(a).

• Iσ(~i) is defined for each σ ∈ Σ, and denotes a Boolean representation of that

symbol. For the NFA in Figure 3.1, I0(i) = i (i.e., i = 0) and I1(i) = i.

• F(~x) denotes the current set of frontier states of the NFA. It is thus a Boolean

representation of the set F at any instant during the operation of the NFA. For

our running example, if F = {A}, F(x) = x, while if F = {A, B}, then F(x) =

x ∨ x.

• A(~x) is a Boolean representation of Fin, and denotes the accepting states. In

Figure 3.1, A(x) = x.

Note that T (~x, ~i, ~y), Iσ(~i) and A(~x) can be computed automatically from any

representation of NFAs. The initial frontier F = {q0} can also be represented as a

Boolean formula.

Suppose that the frontier at some instant during the operation of the NFA is F(~x),

and that the next symbol in the input is σ. The following Boolean formula, G(y),

symbolically denotes the new frontier of states in the NFA after σ has been processed.

G(~y) = ∃ ~x.∃ ~i.[T (~x,~i, ~y) ∧ Iσ(~i) ∧ F(~x)]

To see why G(~y) is the new frontier, consider the truth table of the Boolean function

T (~x, ~i, ~y). By construction, this function evaluates to 1 only for those values of ~x,

29

~i, and ~y for which (~x, ~i, ~y) is a transition in the automaton. Similarly, the function

F(~x) evaluates to 1 only for the values of ~x that denote states in the current frontier of

the NFA. Thus, the conjunction of T (~x, ~i, ~y) with F(~x) and Iσ(~i) only “selects” those

rows in the truth table of T (~x, ~i, ~y) that correspond to the outgoing transitions from

states in the frontier labeled with the symbol σ. However, the resulting conjunction is a

Boolean formula in ~x,~i and ~y. To find the new frontier of states, we are only interested

in the values of ~y (i.e., the target states of the transitions) for which the conjunction

has a satisfying assignment. We achieve this by existentially quantifying ~x and ~i to

obtain G(~y). To express the new frontier in terms of the Boolean variables in ~x, we

rename the variables in ~y with their counterparts in ~x.

We illustrate this idea using the example in Figure 3.1. Suppose that the current

frontier of the NFA is F = {A, B}, and that the next input symbol is a 0, which

causes the new frontier to become {A}. In this case, T (x, i, y) is the function shown

in Figure 2.1(a), I0(i) = i and F(x) = x ∨ x. We have T (x, i, y) ∧ I0(i) ∧ F(x) =

(x∧ i∧ y). Existentially quantifying x and i from the result of this conjunction, we get

G(y) = y. Renaming the variable y to x, we get F(x) = x, which is a Boolean formula

that denotes {A}, the new frontier.

To determine whether the NFA accepts an input string, it suffices to check that

F ∩ Fin 6= ∅. Using the Boolean notation, this translates to check whether F(~x)

∧ A(~x) has a satisfying assignment. In the example above with F = {A}, F(x) = x

and A(x) = x, so the NFA is not in an accepting configuration. Recall that checking

satisfiability of a Boolean function is an O(1) operation if the function is represented

as an OBDD.

3.2.2 NFA-OBDDs

The main idea behind NFA-OBDDs is to represent and manipulate the Boolean func-

tions discussed above using OBDDs. Formally, an NFA-OBDD for an NFA (Q, Σ, δ, q0,

Fin) is a 7-tuple (~x, ~i, ~y, OBDD(T ), {OBDD(Iσ | ∀σ ∈ Σ)}, OBDD(Fq0), OBDD(A)),

where ~x, ~i, ~y are vectors of Boolean variables, and T , Iσ, and A are the Boolean for-

mulae discussed in Section 3.2.1. Fq0 denotes the Boolean function that denotes the

30

frontier {q0}. For each input symbol σ, the NFA-OBDD obtains a new frontier as

discussed earlier. The main difference is that the Boolean operations are performed as

operations on OBDDs.

The use of OBDDs allows NFA-OBDDs to be more time-efficient than NFAs. In

an NFA, the transition table must be consulted for each state in the frontier, lead-

ing to O(|δ| × |F |) operations per input symbol. In contrast, the complexity of

OBDD operations to obtain a new frontier is approximately O(sizeof(OBDD(T )) ×

sizeof(OBDD(F))). This is because the complexity of obtaining a new frontier is

dominated by the cost of an Apply operation on OBDD(T ) and OBDD(F), which

costs O(sizeof(OBDD(T )) × sizeof(OBDD(F))) [20]. Since OBDDs are a compact

representation of the frontier F and the transition relation δ, NFA-OBDDs are more

time-efficient than NFAs. The improved performance of NFA-OBDDs is particularly

pronounced when the transition table of the NFA is sparse or the NFA has large fron-

tiers, because OBDDs can effectively remove redundancy in the representations of δ

and F .

The reason that NFA-OBDDs retain the space-efficiency of NFAs is that NFA-

OBDDs can be combined using the same algorithms that are used to combine NFAs.

Although the use of OBDDs may lead NFA-OBDDs to consume more memory than

NFAs, our experiments show that the increase is marginal. In particular, the cost is

dominated by OBDD(T ), which has a total of 2×dlg |Q|e + dlg |Σ|e Boolean variables.

Even in the worst case, this OBDD consumes only O(|Q|2 × |Σ|) space, which is com-

parable to the worst-case memory consumption of the transition table of a traditional

NFA. However, in practice, the memory consumption of NFA-OBDDs is much smaller

than this asymptotic limit.

3.3 Experimental Apparatus and Data Sets

We evaluated the feasibility of our approach using a software-based implementation of

NFA-OBDDs. As depicted in Figure 3.2, the experimental apparatus consists of two

offline components and an online component.

31

Figure 3.2: Components of our software-based implementation of NFA-OBDDs.

The offline components are executed once for each set of regular expressions, and

consist of re2nfa and nfa2obdd. The re2nfa component accepts a set of regular expres-

sions as input, and produces an ε-free NFA as output. To do so, it first constructs NFAs

for each of the regular expressions using Thompson’s construction [84, 38], combines

these NFAs into a single NFA, and eliminates ε transitions. The nfa2obdd component

analyzes this NFA to determine the number of Boolean variables needed (i.e., the sizes

of the ~x, ~i and ~y vectors), and constructs OBDD(T ), OBDD(A), OBDD(Iσ) for each

σ ∈ Σ, and OBDD(Fq0).

As discussed in Chapter 2, the size of an OBDD for a Boolean formula is sensitive

to the total order imposed on its variables. Variable ordering also impacts the structure

of OBDDs, and therefore the performance of NFA-OBDDs. We empirically determined

that an ordering of variables of the form ~i < ~x < ~y yields high-performance NFA-

OBDDs. Our implementation of nfa2obdd therefore uses this ordering for ~i, ~x and ~y.

Within each vector, nfa2obdd orders variables in increasing order from most significant

bit to least significant bit. Section 3.4.6 presents a detailed evaluation of the impact of

variable ordering on the performance of NFA-OBDDs.

The online component, exec nfaobdd, begins execution by reading these OBDDs into

memory and processes a stream of network packets. It matches the contents of these

network packets against the regular expressions using the NFA-OBDD. To manipulate

OBDDs and produce a new frontier for each input symbol processed, this component

interfaces with Cudd, a popular C++-based OBDD library [79]. It checks whether each

frontier F produced during the operation of the NFA-OBDD contains an accepting

state. If so, it emits a warning with the offset of the character in the input stream that

32

triggered a match, as well as the regular expression(s) that matched the input.1 Note

that in a NIDS setting, it is important to check whether the frontier F obtained after

processing each input symbol contains an accepting state (rather than after processing

the entire input string, which is the traditional operating model for finite automata).

This is because any byte in the network input may cause a transition in the NFA that

triggers a match with a regular expression. We call this the streaming model because

the NFA continuously processes input symbols from a network stream. This model is

equivalent to using regular expressions to find all matching substrings within a string,

as the characters in the string are presented to the matching algorithm one at a time.

3.3.1 Signature Sets and Network Traffic

Signature Sets

We evaluated our implementation of NFA-OBDDs with three sets of regular expressions

(we have made these signatures available for download [98]). The first set was obtained

from the authors of the XFA paper [76], and contains 1503 regular expressions that

were synthesized from the March 2007 snapshot of the Snort HTTP signature set. The

second and third sets, numbering 2612 and 98 regular expressions, were obtained from

the October 2009 snapshot of the Snort HTTP and FTP signature sets, respectively.

About 50% of these regular expressions were taken from the uricontent fields of the

signatures, while the rest were extracted from the pcre fields. Although extracting just

pcre fields from individual Snort rules only captures a portion of the corresponding

rules, it suffices for our experiments as our primary goal is to evaluate the performance

of NFA-OBDDs against other regular-expression based techniques. All three sets of reg-

ular expressions include client-side and server-side signatures. For all sets, we excluded

Snort signatures that contained non-regular constructs, such as back-references and sub-

routines (which are allowed by Perl-compatible regular expression package (PCRE)),

as these constructs are not regular and therefore cannot be implemented in NFA-based

models. In all, we excluded 1837 HTTP and 41 FTP signatures due to non-regular

1Multiple regular expressions may trigger a match on an input symbol; these regular expressionscan be identified using the set of states that appear in the conjunction F(~x) ∧ A(~x).

33

Number of HTTP commands Matches triggered: Total number (# Distinct sigs.)

Trace GET POST HEAD PUT HTTP/1503 HTTP/2612Rutgers 653,670 137,737 3,504 1,576 1,816,410 (47) 17,107,588 (120)DARPA 1,333,469 36,386 450,480 126,824 37,952,078 (121) 190,662,579 (205)

Figure 3.3: Statistics characterizing various aspects of the HTTP traces used in ourexperiments. The “Matches triggered” columns show the total number of signaturematches triggered by the traces as well as the number of distinct signatures thatmatched.

constructs.

HTTP traffic

We evaluated the performance of HTTP signatures by feeding two sets of HTTP traffic

traces to exec nfaobdd:

(1) Rutgers traces. We recorded HTTP traffic at the Web server of the Rutgers Com-

puter Science Department for a one week period in August 2009. This traffic was

collected using tcpdump, and includes whole packets of port 80 traffic from the Web

server. The traffic observed during this period consisted largely of Web traffic typi-

cally observed at an academic department’s main Web server; most of the traffic was

to view and query Web pages hosted by the department. Overall, this week-long trace

contained connections from 18,618 distinct source IP addresses. It contained a total

of 1.24GB worth of data, with the payloads in the network packets ranging in size

from 1 byte to 1,460 bytes, with an average of 126 bytes (standard deviation of 271).

Figure 3.3 presents statistics that characterize various other aspects of the trace. The

total number of matches triggered shown in Figure 3.3 is not indicative of the number

of alerts produced by Snort because our signature sets only contain patterns from the

pcre and uricontent fields of the Snort rules. The number of matches is large because

signatures contain patterns that are common in HTTP packets.

(2) DARPA traces. We used publicly available traces from the 1999 DARPA intrusion

detection evaluation data sets [48]. Privacy concerns preclude us from releasing the

network traces collected in our department. We therefore report experimental results

with the DARPA traces to ensure that our experiments can be repeated independently

by other researchers.

34

Command CWD LIST MDTM MKD PASS PORT PWDNumber of Instances 62,561 3,098 613 89 14,701 232 453

Command QUIT RETR SIZE STOR TYPE USER -Number of Instances 12,244 7,676 1,110 1,401 12,201 14,834 -

Figure 3.4: Statistics showing the number of commands observed in the FTP tracesused in our experiments.

We acknowledge that the DARPA traces are no longer in popular use for intrusion

detection research. Indeed, researchers have even argued that they are inadequate for

the purpose that they were originally developed (to test the effectiveness of intrusion

detection systems at detecting attacks; e.g., see [81, 51]). Nevertheless, they suffice as an

independent data point for our experiments as our goal is to measure the performance

of regular expression matching, and not to test their effectiveness at detecting real

attacks.

We used traces from weeks two, four and five of the DARPA data set (only the

traffic from these weeks contain actual instances of attacks). These traces contain

about 11.7GB worth of data, and contain connections from 8,331 distinct source IP

addresses. The payloads in the network packets ranged in size from 2 bytes to 1,460

bytes, with an average size of 351 bytes (standard deviation of 576).

FTP traffic

We evaluated the FTP signatures using two traces of live FTP traffic (from the com-

mand channel), obtained over a two week period in March 2010 from our department’s

FTP server; these FTP traces contained 19.4MB and 24.7MB worth of data. The traffic

consisted of FTP requests to fetch and update technical reports hosted by our depart-

ment. We observed traffic from 528 distinct source IP addresses during this period.

Statistics on various FTP commands observed during this period appear in Figure 3.4

(commands that were not observed are not reported). This traffic triggered 9,656 and

15,976 matches in the FTP/98 signature set, corresponding to matches on 6 and 5 dis-

tinct signatures, respectively. The payload sizes of packets ranged from 2 to 402 bytes

with an average of 40 bytes (standard deviation of 44).

35

Since our primary goal is to study the performance of NFA-OBDDs, we assume that

the HTTP and FTP traces have been processed using standard NIDS operations, such

as defragmentation and normalization. We fed these traces, which were in tcpdump

format, to exec nfaobdd.

3.3.2 Experimental Setup

All our experiments were performed on a Intel Core2 Duo E7500 Linux-2.6.27 machine,

running at 2.93GHz with 2GB of memory (however, our programs are single-threaded,

and only used one of the available cores). We used the Linux /proc file system to

measure the memory consumption of nfa2obdd and the Cudd ReadMemoryInUse util-

ity to obtain the memory consumption of exec nfaobdd. We instrumented both these

programs to report their execution time using processor performance counters. We

report the performance of exec nfaobdd as the number of CPU cycles to process each

byte of network traffic (cycles/byte), i.e., fewer processing cycles/byte imply greater

time-efficiency. All our implementations were in C++; we used the GNU g++ com-

piler suite (v4.3.2) with the O6 optimization level to produce the executables used for

experimentation.

3.4 Experimental Evaluation

This section reports the performance of NFA-OBDDs, and compares them against the

performance of NFAs, the PCRE package, which is a popular library for regular ex-

pression matching, and variants of DFAs. Our experiments show that NFA-OBDDs:

1. outperform traditional NFAs by up to three orders of magnitude while retaining

their space-efficiency (Section 3.4.2);

2. outperform or are competitive in performance with the PCRE package (Sec-

tion 3.4.3);

3. are competitive in performance with variants of DFAs while being drastically less

memory-intensive (Section 3.4.4).

36

Size of the input NFA |OBDD(T )| Construction

Signature Set #Reg. Exps. #States #Transitions #Nodes Time/MemoryHTTP (March 2007) 1503 159,734 3,986,769 659,981 305sec/176MB

HTTP (October 2009) 2612 239,890 5,833,911 989,236 453sec/176MBFTP (October 2009) 98 26,536 5,927,465 69,619 246sec/134MB

Figure 3.5: NFA-OBDD construction results.

We also present a detailed performance breakdown of NFA-OBDDs in terms of

OBDD operations (Section 3.4.5) and the impact of OBDD variable ordering on NFA-

OBDD performance (Section 3.4.6).

3.4.1 NFA-OBDDs: Construction and Performance

We used nfa2obdd to construct NFA-OBDDs from ε-free NFAs of the regular expression

sets. Figure 3.5 presents statistics on the sizes of the input NFAs, the size of the largest

of the four OBDDs in the NFA-OBDD (OBDD(T )), and the time taken and memory

consumed by nfa2obdd. For the NFA-OBDDs corresponding to the HTTP signature

sets, the vectors ~x and ~y had 18 Boolean variables each, while the vector ~i had 8

Boolean variables to denote the 256 possible ASCII characters. For the NFA-OBDD

corresponding to the FTP signature set, the vectors ~x and ~y had 15 Boolean variables

each. We also tried to determinize these NFAs to produce DFAs, but the determinizer

ran out of memory in all three cases.

Figure 3.6 depicts the performance of NFA-OBDDs. Figure 3.6(a) and Figure 3.6(b)

show the performance for the Rutgers and DARPA HTTP traces, while Figure 3.6(c)

shows the performance for both FTP traces. Figure 3.7 presents the raw throughput and

memory consumption of NFA-OBDDs observed for each signature set. The throughput

and memory consumption of NFA-OBDDs varies slightly across different traces for each

signature set. This difference was most pronounced for the HTTP/2612 signature set,

where the Rutgers trace was processed almost 1.8× faster than the DARPA trace. The

variance in performance can be attributed to the size and shapes of OBDD(F) (the

OBDD of the NFA’s frontier) observed during execution.

37

103

104

105

106

107

108

0

50

100

150

200

250

Processing time (cycles/byte)

Mem

ory

usag

e (M

B)

NFANFA−OBDDMDFAPCRE

104

105

106

107

108

0

100

200

300

400

500


Mem

ory

usag

e (M

B)

NFANFA−OBDDMDFA−2604−sigsPCRE

(a) HTTP/1503 signature set (b) HTTP/2612 signature set

103

104

105

0

20

40

60

80

100


Mem

ory

usag

e (M

B)

NFANFA−OBDDMDFA−95−sigsPCRE

(c) FTP/98 signature set

Figure 3.6: Comparing memory versus processing time of NFA-OBDDs, traditionalNFAs, the PCRE package, and different MDFAs for the Snort HTTP and FTP signa-ture sets. The x-axis is in log-scale. Note that Figure 3.6(b) and (c) only report theperformance of MDFAs with 2604 and 95 regular expressions, respectively.

3.4.2 Comparison with NFAs

We compared the performance of NFA-OBDDs with an implementation of NFAs that

uses Thompson’s algorithm. This algorithm maintains a frontier F , and operates as

follows: For each state s in the frontier F , fetch the set of targets Ts of the transitions

labeled σ (the input symbol), and compute the new frontier as F ′ =⋃s ∈ F Ts. The

performance and memory consumption of our NFA implementation (as also the PCRE

package and DFA variants in Section 3.4.3 and Section 3.4.4) was relatively stable across

all the traces for each signature set. Figure 3.6 therefore reports only the averages across

these traces.

As Figure 3.6 shows, NFA-OBDDs outperform NFAs for all three sets of signatures

38

Signature Set Processing time Memory

NFA-OBDDsHTTP/1503 6,844–7,582 cycles/byte 58MBHTTP/2612 22,968–41,588 cycles/byte 61MB

FTP/98 5,095 cycles/byte 8MB

NFAsHTTP/1503 1.3× 107 cycles/byte 53MBHTTP/2612 2.1× 107 cycles/byte 73MB

FTP/98 5.6× 105 cycles/byte 29MB

PCREHTTP/1503 2.1× 105–6.2× 105 cycles/byte 3.6MBHTTP/2612 1.3× 107–2.8× 107 cycles/byte 3.9MB

FTP/98 2,210–6,185 cycles/byte 5.9–6.2MB

MDFA (partial signature sets in Figure 3.6(b) and (c))HTTP/1503 1,000–15,951 cycles/byte 71–232MBHTTP/2604 15,891–49,296 cycles/byte 335–426MB

FTP/95 1,160–1,386 cycles/byte 54–82MB

Figure 3.7: Raw performance numbers for the charts shown in Figure 3.6.

by approximately three orders of magnitude for the HTTP signatures, and two orders

of magnitude for the FTP signatures. In Figure 3.6(a), for example, NFA-OBDDs

are approximately 1600×–1800× faster than NFAs while consuming almost the same

amount of memory. The difference in the performance gap between NFA-OBDDs and

NFAs for the HTTP and FTP signatures can be attributed to the number and structure

of these signatures. As discussed in Section 3.2.2, the benefits of NFA-OBDDs are more

pronounced if larger frontiers are to be processed. Since there are a larger number of

HTTP signatures, the frontier for the corresponding NFAs are larger. As a result,

NFA-OBDDs are much faster than the corresponding NFAs for HTTP signatures than

for FTP signatures. Nevertheless, these results clearly demonstrate that OBDDs can

improve the time-efficiency of NFAs without compromising their space-efficiency.

3.4.3 Comparison with the PCRE Package

We compared the performance of NFA-OBDDs with that of the PCRE package, which

is a popular library for regular expression matching implemented by recursive back-

tracking. Figure 3.6 reports three numbers for the performance of the PCRE package,

corresponding to different values of configuration parameters of the package (these

parameters determine whether PCRE must process input in the ASCII or Unicode

formats, and whether the matching algorithm must terminate after finding the first

39

matching substring or all matching substrings). In both Figure 3.6(a) and (b), NFA-

OBDDs outperform the PCRE package. The throughput of NFA-OBDDs is about

an order of magnitude better than the fastest configuration of the PCRE package for

the set HTTP/1503. The difference in performance is more pronounced for the set

HTTP/2612, where NFA-OBDDs outperform the most time-efficient PCRE configura-

tion by approximately 300×–500×. The poorer throughput of the PCRE package for

the second set of signatures is likely because the backtracking algorithm that it em-

ploys degrades in performance as number of paths to be explored in the NFA increases.

However, in both cases, the PCRE package is more space-efficient than NFA-OBDDs,

and consumes about 4MB memory.

For the FTP signatures (Figure 3.6(c)), NFA-OBDDs are about 2.5× slower than the

fastest PCRE configuration. However, unlike NFA-OBDDs which report all substrings

of an input packet that match signatures, this PCRE configuration only reports the

first matching substring. The performance of the PCRE configurations that report all

matching substrings is comparable to that of NFA-OBDDs.

Note that in all cases, the PCRE package outperforms our NFA implementation,

which use Thompson’s algorithm [84] to parse input strings. Despite this gap in perfor-

mance, Cox [28] shows that Thompson’s algorithm performs more consistently than the

backtracking approach employed by PCRE. For example, the backtracking approach

is vulnerable to algorithmic complexity attacks, where a maliciously-crafted input can

trigger the worst-case performance of the algorithm [75].

3.4.4 Comparison with DFA Variants

Multiple DFAs

We compared the performance of NFA-OBDDs with a variant of DFAs, called multiple

DFAs (MDFAs), produced by set-splitting [102]. We were unable to compare the per-

formance of NFA-OBDDs against DFAs because DFA construction ran out of memory.

However, prior work [77] estimates that DFAs may offer throughputs of about 50 cy-

cles/byte. An MDFA is a collection of DFAs representing a set of regular expressions.

40

Each DFA represents a disjoint subset of the regular expressions. To match an input

string against an MDFA, each constituent DFA is executed against the input string to

determine whether there is a match. MDFAs are more compact than DFAs because

they result in a less than multiplicative increase in the number of states. However, MD-

FAs are also slower than DFAs due to the reason that all the constituent DFAs must be

matched against the input string. An MDFA that has a larger number of constituent

DFAs will be more compact, but will also have lower time-efficiency than an MDFA

with fewer DFAs.

Using Yu et al.’s algorithms [102], we produced several MDFAs by combining the

Snort signatures in several ways, each with different space/time utilization. Each point

in Figure 3.6 denotes the performance of one MDFA (again, averaged over all the input

traces), which in turn consists of a collection of DFAs, as described above.

Producing MDFAs for the HTTP/2612 and FTP/98 signature sets was more chal-

lenging, primarily because these sets contained several structurally-complex regular

expressions that were difficult to determinize efficiently. For example, they contained

several signatures with large counters (i.e., sequences of repeating patterns) often used

in combination with the choice (i.e., re1|re2) operator. Our determinizer frequently ran

out of memory when attempting to construct MDFAs for such regular expressions. As

an example, consider the following regular expression in HTTP/2612:

/.*\x2FCSuserCGI\x2Eexe\x3FLogout\x2B[^\s]{96}/i

Our determinizer consumed 1.6GB of memory for this regular expression alone, before

aborting. Producing a DFA for such regular expressions may require more sophisticated

techniques, such as on-the-fly determinization [80] that are not currently implemented

in our prototype. We therefore decided to exclude problematic regular expressions,

and constructed MDFAs with the remaining ones (2604 for HTTP/2612 and 95 for

FTP/98). Note that the MDFAs for these smaller sets of regular expressions may be

more time-efficient and much more space-efficient than corresponding MDFAs for the

entire set of regular expressions.

41

Figure 3.6 shows that in many cases NFA-OBDDs can provide throughputs com-

parable to those offered by MDFAs while utilizing much less memory. For example,

the fastest MDFA in Figure 3.6(b) (constructed for a subset of 2604 signatures) offered

about 50% more throughput than NFA-OBDDs, but consumed 7× more memory. The

remaining MDFAs for this signature set had throughputs comparable to those of NFA-

OBDDs, but consumed 270MB more memory than NFA-OBDDs. The performance

gap between NFA-OBDDs and MDFAs was largest for FTP signature set, where the

MDFAs (for a subset of 95 signatures) were about 4× faster than the NFA-OBDD;

however, the MDFAs consumed 46MB-74MB more memory.

These results are significant for two reasons. First, conventional wisdom has long

held that traditional NFAs operate much slower than their deterministic counterparts.

This is also supported by our experiments, which show that the time-efficiency of NFAs

is three to four orders of magnitude slower than that of MDFAs. However, our results

show that OBDDs can drastically improve the performance of NFAs and even make

them competitive with MDFAs, which are a determinstic variant of finite automata. We

believe that further enhancements to improve the time-efficiency of NFA-OBDDs can

make them operate even faster than MDFAs (e.g., by relaxing the OBDD data struc-

ture, and thereby eliminating several graph operations in the Apply and Restrict

operations).

Second, NFA-OBDDs were produced automatically from regular expressions. In

contrast, processing the set of regular expressions to produce compact yet performant

MDFAs is a non-trivial exercise, often requiring time-consuming partitioning heuristics

to be applied [102]. Some of the partitioning heuristics described by Yu et al. also re-

quire modifications to the set of regular expressions, thereby changing their semantics.

Our own experience in attempting to construct MDFAs for HTTP/2612 and FTP/98

shows that this process is often challenging, especially if the regular expressions con-

tain complex structural patterns. In contrast, NFA-OBDDs can be constructed in a

straightforward manner from regular expressions, including those with counters and

other complex structural patterns, and are yet competitive in performance and more

compact than MDFAs.

42

Operation Fraction

AndAbstract 50%And 39%Map 4%

Acceptance check 7%

Figure 3.8: Fraction of time spent performing OBDD operations.

Hybrid Finite Automata

Finally, we also attempted to compare the performance of NFA-OBDDs with a variant

of DFAs, called hybrid finite automata (HFA) [12]. HFAs are a hybrid of NFAs and

DFAs, and are constructed by interrupting the determinization algorithm when it en-

counters structurally-complex patterns (e.g., large counters and .* patterns) that are

known to cause memory blowups when determinized. We used Becchi and Crowley’s

implementation [12] in our experiments, but found that it ran out of memory when

trying to construct HFAs from our signature sets. For example, the HFA construc-

tion process exhausted the available memory on our machine after processing just 106

regular expressions in the HTTP/1503 set. It may be possible to construct a collec-

tion of HFAs in a manner akin to MDFAs, but we did not consider this design in our

experiments.

3.4.5 Deconstructing NFA-OBDD Performance

We further analyzed the performance of NFA-OBDDs to understand the time consump-

tion of each OBDD operation. The results of this analysis can motivate techniques to

optimize OBDD packages to further improve the time-efficiency of NFA-OBDDs. The

results reported in this section are based upon the HTTP/1503 signature set; the results

with the other signature set were similar.

Figure 3.8 shows the fraction of time that exec nfaobdd spends performing various

OBDD operations as it processes a single input symbol. These include the operations

needed to compute a new frontier and those needed to check if the frontier contains an

accepting configuration.

As discussed earlier, exec nfaobdd uses the Cudd package to manipulate OBDDs.

43

Although Cudd implements the OBDD operations described in Chapter 2, it also imple-

ments composite operations that combine multiple Boolean operations; the composite

operations are often more efficient than performing the individual operations separately.

AndAbstract is one such operation, which allows two OBDDs to be combined using

an And operation followed by an existential quantification. AndAbstract takes a list

of Boolean variables to be quantified, and performs the OBDD transformations needed

to eliminate all these variables. The Map operation allows variables in an OBDD to be

renamed, e.g., it can be used to rename the ~y variables in G(~y) to ~x variables instead.

We implemented the Boolean operations required to obtain a new frontier (described

in Section 3.2.1) using one set of And, AndAbstract and Map operations. Each

AndAbstract step existentially quantifies 26 Boolean variables (the ~x and~i variables).

To check whether a frontier should be accepted, we used another And operation to

combine OBDD(F) and OBDD(A); the cost of an acceptance check appears in the last

row of Figure 3.8.

Figure 3.8 shows that the cost of processing an input symbol is dominated by the

cost of the AndAbstract and And operations to compute a new frontier. This is

because the sizes of the OBDDs to be combined for frontier computation are bigger

than the OBDDs that must be combined to check acceptance. Moreover, computing

new frontiers involves several applications of Apply and Restrict, as opposed to an

acceptance check, which requires only one Apply, thereby causing frontier computation

to dominate the cost of processing an input symbol.

These results suggest that an OBDD implementation that optimizes the AndAb-

stract and And operations (or a relaxed variant of OBDDs that allows for more

efficient AndAbstract and And operations) can further improve the performance of

NFA-OBDDs.

3.4.6 Impact of Variable Ordering on NFA-OBDD Performance

As mentioned in Chapter 2, the size of an OBDD is sensitive to the total order imposed

on its variables. Bryant [20] showed that it is NP-hard to determine whether a particular

variable ordering minimizes the size of an OBDD for a Boolean function. Variable order

44

103

104

105

106

107

108

109

35

40

45

50

55

60

Execution speed (cycles/byte)

Me

mo

ry u

sa

ge

(M

B)

i<x<y

x<i<y

y<x<i

i<(Interleave)xy

i<(Inerleave)yx

(Interleave)xy<i

(Interleave)yx<i

104

105

106

107

108

109

40

45

50

55

60


Mem

ory

usag

e (M

B)

i<x<yx<i<yy<x<ii<Interleave(xy)i<Interleave(yx)Interleave(xy)<iInterleave(yx)<i

(a) HTTP/1503 signature set (b) HTTP/2612 signature set

103

104

105

106

107

108

6

8

10

12

14

16

18


Mem

ory

usage (

MB

)

i<x<y

x<i<y

y<x<i

i<(Interleave)xy

i<(Inerleave)yx

(Interleave)xy<i

(Interleave)yx<i

(c) FTP/98 signature set

Figure 3.9: Impact of OBDD variable ordering on the performance of NFA-OBDDs.

also impacts the structure of OBDDs, and in our experience, the order of the variables

in the vectors ~i, ~x and ~y influences the performance of NFA-OBDDs.

We experimented with various total orders to empirically determine their impact

on the size and throughput of NFA-OBDDs before settling on one of the total orders

that yielded the best performance (~i < ~x < ~y) for the numbers reported earlier in this

section.

Figure 3.9 compares the performance of NFA-OBDDs constructed using seven total

orders (all four constituent OBDDs of each NFA-OBDD use the same total order):

1. ~i < ~x < ~y: the variables within each vector are arranged in increasing order from

most significant bit (MSB) to least significant bit (LSB);

2. ~x < ~i < ~y: in-vector order is similar to the case above;

3. ~y < ~x < ~i: in-vector order is similar to the case above;

45

4. ~i < Interleave[~x < ~y]: variables in the vector~i appear before the variables in ~x

and ~y. The variables in ~x and ~y appear interleaved, with a variable in ~x appearing

before the corresponding variable in ~y. The variables in each vector increase from

MSB to LSB;

5. ~i < Interleave[~y < ~x]: similar to the case above, except that variables in ~y

appear before their counterparts in ~x;

6. Interleave[~x < ~y] <~i: as above, except that the interleaved variables of ~x and

~y appear in the total order before the variables of ~i;

7. Interleave[~y < ~x] < ~i: as above, with the variables in ~y preceding their coun-

terparts in ~x.

The above variable orders are only a tiny fraction of the set of possible total orders,

which is exponential in the number of variables. However, they provide insight into

which orders empirically provide high-performance NFA-OBDDs. We considered NFA-

OBDDs for HTTP/1503, HTTP/2612 and FTP/98 to determine whether the perfor-

mance NFA-OBDDs for different signature sets is sensitive to the order imposed on the

variables. As before, we used exec nfaobdd to feed network traces to these NFA-OBDDs.

For these experiments, we only used the network traces collected at Rutgers.

Figure 3.9 presents the results of these experiments, showing both the throughput

and overall memory consumption of exec nfaodd. It shows that the total order ~i <

~x < ~y performs consistently well across all three signature sets, but consumes more

memory than the most compact implementation. The total order ~i < Interleave[~x

< ~y] also provides competitive performance for the NFA-OBDDs of all three signature

sets. However, the performance gap between the best and the worst total orders (~y < ~x

<~i) is almost four orders of magnitude. We used the order~i < ~x < ~y for the experiments

reported earlier in this section, though we could have used ~i < Interleave[~x < ~y] as

well, with a slightly smaller memory consumption for the FTP/98 NFA-OBDD.

This figure also shows that there is no direct correlation between the size and per-

formance of exec nfaobdd for different variable orders. Although the time complexity of

46

algorithms such as Apply and Restrict asymptotically depends on the size of their

input OBDDs, factors such as the structure of the OBDDs also affect the number of

graph operations, and therefore the performance of the corresponding NFA-OBDDs.

These experiments lead us to conclude that the performance of NFA-OBDDs is

indeed sensitive to the total order imposed on its variables. The vast search space of

total orders diminishes hopes of a tractable algorithm to identify the total order that

would yield the best-performing NFA-OBDD for a given signature set. Nevertheless,

in practice, experiments with a few total orders (such as the ones in Figure 3.9) can

help empirically determine high-performance NFA-OBDDs. Future work could develop

heuristics that leverage the structure of the regular expressions in the input signature

set to determine “good” total orders.

3.5 Matching Multiple Input Symbols

The preceding sections assumed that only one input alphabet is processed in each

step. However, there is growing interest to develop techniques for multi-byte matching,

i.e., matching multiple input symbols in one step. Prior work has shown that multi-byte

matching can improve the throughput of NFAs [18, 14]. In this section, we present one

such technique, k-stride NFAs [18], and show that OBDDs can further improve the

performance of k-stride NFAs.

A k-stride NFA matches k symbols of the input in a single step. Given a traditional

(i.e., 1-stride) ε-free NFA (Q, Σ, ∆, q0, F ), a k-stride NFA is a 5-tuple (Q, Σk, Γ, q0, F ),

whose input symbols are k-grams, i.e., elements of Σk. The set of states and accepting

states of the k-stride NFA are the same as those for the 1-stride NFA. Intuitively, the

transition function Γ of the k-stride NFA is computed as a k-step closure of ∆, i.e., (s,

σ1σ2 . . . σk, t) ∈ Γ if and only if the state t is reachable from state s in the original

NFA via transitions labeled σ1, σ2, . . ., σk. The algorithm to compute Γ from ∆ must

also consider cases where the length of the input string is not a multiple of k. This is

achieved by padding the input string with a new “do-not-care” symbol, and introducing

this symbol in the labels of selected transitions. We refer the interested reader to prior

47

Figure 3.10: 2-stride NFA for Figure 3.1.

work [18, 14] for a detailed description of the construction.

Figure 3.10 presents an example of a 2-stride NFA corresponding to the NFA in

Figure 3.1. The do-not-care symbol is denoted by a “•”. Thus, for instance, an input

string 101 would be padded with • to become 101•. The 2-stride NFA processes digrams

in each step. Thus, the first step would result in a transition from state A to itself A

(because of the transition labeled 10), followed by a transition from A to B when it

reads the second digram 1•, thereby accepting the input string.

A k-stride NFA (Q, Σk, Γ, q0, F ) can readily be converted into a k-stride NFA-

OBDD using the same approach described in Section 3.2. The main difference is that

the input alphabet is Σk (plus a new symbol “•”); the vector ~i would therefore contain

k times as many Boolean variables. However, two additional details must be addressed

when applying k-stride NFAs (and the corresponding NFA-OBDDs) to the problem of

matching traffic patterns in a NIDS, namely, (i) adapting k-stride NFAs to work in the

streaming model; and (ii) reducing the space consumption of k-stride NFAs. These are

discussed next.

3.5.1 Adapting to the Streaming Model

When operating a 1-stride NFA to process a stream of inputs, the frontier of states must

be checked after processing each input symbol to determine whether the input triggered

a match. However, this technique does not suffice for k-stride NFAs in the streaming

model. To see why, consider how the NFAs in Figure 3.1 and Figure 3.10 would process

the input 10. The 1-stride NFA would trigger an alert after the first symbol has been

processed (because the frontier F = {A, B} contains an accepting state). In contrast,

48

Figure 3.11: The NFA in Figure 3.10 adapted for streaming.

the 2-stride NFA would process the entire input 10 in one step, resulting in the frontier

F = {A}, which does not contain the accepting state. Therefore, for k-stride NFAs, it

does not suffice to simply check F to determine acceptance.

To address this problem, the algorithm to convert a 1-stride NFA into a k-stride

NFA must “remember” that an accepting state of the 1-stride NFA was encountered

when computing the k-step closure of the transition relation ∆ of the 1-stride NFA. One

way to compute k-stride NFAs that achieve this goal is by adding a new accepting state

to the k-stride NFA. This algorithm adds incoming transitions to the new accepting

state suitably from other states in the k-stride NFA to “remember” that a substring

of the input k-gram would have triggered a match in the 1-stride NFA. The resulting

k-stride NFA (Q∗, Σk, Γ∗, q0, F ∗) has the same semantics as the 1-stride NFA in the

streaming model (i.e., they both accept the same set of strings), but adds a state to the

1-stride NFA. We refer the reader to prior work [18, 14] for details on this algorithm.2

This k-stride NFA can be converted into an NFA-OBDD using the same technique

presented in Section 3.2, and operated in the same way.

Figure 3.11 presents an example of this approach applied to the 2-stride NFA in

Figure 3.10. The new transition from state A to accepting state C on input 10 uses the

state C to remember that the substring 1 of 10 triggered a match in the corresponding

1-gram NFA. The state C will therefore be in the frontier when an input 10 is processed

at state A, thereby triggering a match when the OBDD operation to check acceptance

is performed. However, there are no outgoing transitions from C. Thus, this state

is removed from the frontier when the next digram is processed by the 2-stride NFA

2This algorithm can also be adapted easily to identify the regular expression that matched the input.

49

(unless that digram also triggers acceptance).

3.5.2 Reducing Space Consumption using Alphabet Compression

The transition table of a k-stride NFA can have O(|Q| × |Σ|k) entries, each of which

can be of size O(|Q|) (to store the set of “next” states, which can be O(|Q|)), which

can result in a memory utilization of O(|Q|2 × |Σ|k). However, this asymptotic limit is

rarely reached in practice, and transition tables encountered in practice are generally

sparse. In particular, there may be several transitions labeled with the same set of

symbols from the alphabet Σk. That is, if for any state s ∈ Q, and input symbols σ1

and σ2, if Γ(s, σ1) = Γ(s, σ2), then the symbols σ1 and σ2 can potentially be merged

into an equivalence class. This idea is called alphabet compression [40, 8, 18, 13, 45].

The output of an alphabet compression algorithm is a partition of Σk into equiv-

alence classes. Each equivalence class is assigned a symbol, thereby yielding a new

alphabet E with fewer elements than Σk. An alphabet compression algorithms also

outputs an encoding function m : Σk → E that translates elements in Σk to elements in

E . In the above example, m(σ1) = m(σ2). The transitions of an alphabet-compressed

NFA would also be appropriately relabeled to use symbols from E instead. Similarly,

symbols in the input would also have to be appropriately translated using m before

they are passed to the NFA for matching. An alphabet-compressed NFA can also be

converted into an NFA-OBDD using the same techniques described in Section 3.2, and

operated in the same way.

We implemented the alphabet compression algorithm described by Brodie et al. [18]

for 2-stride NFAs and empirically found that an alphabet compression reduces the

memory consumption of 2-stride NFAs. However, this alphabet compression algorithm

itself is quite resource-intensive because it operates on the transition relation of the

entire (2-stride) NFA, thereby causing the algorithm to exhaust the available memory

on our machine. For example, we found that Brodie et al.’s algorithm [18] frequently

ran out of memory when processing 2-stride NFAs obtained by combining more than

200 regular expressions from the HTTP/1503 and HTTP/2612 signature sets.

We developed a scheme (Algorithm 5) that applies Brodie et al.’s algorithm to

50

Algorithm: Combine Compressed Alphabet (X, Y )Input : X = {X1, . . ., Xp} and Y = {Y1, . . ., Yq}, the

compressed alphabet of NFAX and NFAY

Output : Z, the compressed alphabet of NFAZ = NFAX ∪NFAY

Z = X ∪ Y ;1

Z ′ = ∅;2

foreach (A ∈ Z) do3

split = false;4

foreach (B ∈ Z such that B 6= A) do5

if (A ∩B 6= ∅) then6

Z ′ = Z ′ ∪ (A ∩B) ∪ (A−B) ∪ (B −A);7

split = true;8

if (split == false) then Z ′ = Z ′ ∪ A;9

if (Z 6= Z ′) then10

Z = Z ′;11

goto line 2;12

return Z;13

Algorithm 5: Combining the compressed alphabets oftwo NFAs.

smaller NFAs (thereby limiting the algorithm’s memory consumption), and merges the

results to obtain a compressed alphabet for the combination of the smaller NFAs. Our

scheme is based upon the following fact: if two symbols σ1 and σ2 appear in the same

equivalence class of the compressed alphabet of each of two NFAs (say, NFAX and

NFAY ), then they will appear in the same equivalence class of the NFA (say NFAZ)

obtained by merging the set of states and transitions of NFAX and NFAY . Algorithm 5

uses this observation to combine the compressed alphabet X and Y of NFAX and NFAY

and produce the compressed alphabet Z of the NFAZ . It proceeds by combining X and

Y into a set Z, and iteratively refining Z (in line 7) so that if any two symbols σ1 and

σ2 appear in the same equivalence class in the output set Z, then they also appear in

the same equivalence classes in both X and Y .

Our experiments confirm the scalability of Algorithm 5. For example, we were

able to use this algorithm to compress the alphabet of the 2-stride NFA representing

the 2604 signatures from the HTTP/2612 set. We excluded eight signatures from the

HTTP/2612 signature set because alphabet compression ran out of memory for these

51

eight signatures. These eight signatures contained complex structural patterns that

caused Brodie et al.’s compression algorithm [18] to run out of memory for 2-stride

NFAs representing each of these signatures (so Algorithm 5 was never invoked). Further

research is needed to develop alphabet compression algorithms that can handle such

complex signatures. We did so by first splitting these signatures into 61 smaller subsets,

applying Brodie et al.’s alphabet compression to the 2-stride NFAs representing these

subsets, and combining the compressed alphabet pairwise using Algorithm 5. The size

of the compressed alphabet of the 2-stride NFA was 11,119. In contrast, Brodie et al.’s

algorithm ran out of memory when processing the 2-stride NFA representing set of 2604

signatures in its entirety.

3.5.3 Performance of k-stride NFA-OBDDs

To evaluate the performance of k-stride NFAs and k-stride NFA-OBDDs, we used a

toolchain similar to the one discussed in Section 3.3, but additionally applied alphabet

compression. Although our implementation accepts k as an input parameter, we have

only conducted experiments for k = 2 because our alphabet compression algorithm ran

out of memory for larger values of k.

The setup that we used for the experiments reported below is identical to that

described in Section 3.3. However, we only used two sets of Snort signatures in our

measurements: (1) HTTP/2604, a subset of 2604 HTTP signatures from HTTP/2612

and (2) FTP/95, a subset of 95 FTP signatures from FTP/98 (we omitted three signa-

tures for the reason that Brodie et al.’s compression algorithm [18] ran out of memory

for 2-stride NFAs representing each of these signatures.).

Figure 3.12(a) presents the size of the 1-stride and 2-stride NFA-OBDDs, and the

size of the compressed alphabet. In each case, the alphabet compression algorithm

took over a day to complete, and consumed about 1.6GB memory. Figure 3.12(b) and

(c) compare the performance of 1-stride NFA-OBDDs with the performance of 2-stride

NFA-OBDDs (using the traces described in Section 3.3). As expected, these figures

show that matching multiple bytes in the input stream improves the performance of

52

Signature Set #States #Transitions in NFA (1-stride | 2-stride) #Alphabet SymbolsHTTP/2604 237,972 5,567,317 136,212,770 11,119

FTP/95 15,266 3,361,065 5,136,420 848

(a) 2-stride NFA-OBDD construction results

0 2 4 6 8 10

x 104

0

10

20

30

40

50

60

70

80

90

100


Me

mo

ry u

sa

ge

(M

B)

1−stride−NFA−OBDD

2−stride−NFA−OBDD

103

104

105

106

0

50

100

150

200

250

300

350

400


Mem

ory

usag

e (M

B)

1−stride−NFA2−stride−NFA1−stride−NFA−OBDD2−stride−NFA−OBDD

(b) HTTP/2604 signature set (c) FTP/95 signature set

Figure 3.12: Memory versus throughput for 1-stride and 2-stride NFA-OBDDs. Fig-ure 3.12(c) also shows the performance of the corresponding 1-stride and 2-stride NFAs.

NFA-OBDDs, roughly doubling the throughput in each case. In fact, the 2-stride NFA-

OBDD of the HTTP/2612 signature set more than doubled (2.26×) the throughput of

the 1-stride NFA-OBDD on the DARPA trace.

These experiments also demonstrate that the use of OBDDs allows 2-stride NFA-

OBDDs to be more space-efficient than NFAs. While we were able to create and operate

a 2-stride NFA-OBDD for the HTTP/2604 signature set, the 2-stride NFA for this

signature set exhausted the memory available on our machine. We were able to create

a 2-stride NFA for the FTP/95 signature set; Figure 3.12(c) depicts the performance

of both the 1-stride and 2-stride NFAs for this signature set. As this figure shows, the

memory utilization of the 2-stride NFA-OBDD is about two orders of magnitude smaller

than that of the 2-stride NFA, and is also about two orders of magnitude faster. These

results lead us to conclude that 2-stride NFA-OBDDs are drastically more efficient in

time and space than 2-stride NFAs.

3.6 Related Work

Early NIDS exclusively employed strings as attack signatures. String-based signatures

are space-efficient, because their size grows linearly with the number of signatures. They

53

are also time-efficient, and have O(1) matching algorithms (e.g., Aho-Corasick [7]).

They are ideally suited for wire-speed intrusion detection, and have been implemented

both in software and hardware [31, 57, 68, 43, 83, 26, 49, 82, 83, 86, 87, 72]. However,

prior work has shown that string-based signatures can easily be evaded by malware

using polymorphism, metamorphism and other mutations [36, 41, 65, 70]. The research

community has therefore been investigating sophisticated signature schemes, such as

session signatures [67, 80, 101] and vulnerability signatures [88, 19], that require the full

power of regular expressions. This in turn, has spurred both the research community

to develop improved algorithms for regular expression matching, as well as NIDS ven-

dors, who are increasingly beginning to deploy products that use regular expressions

(e.g., Tipping Point [2], LSI Corporation [1] and Cisco [24]).

DFAs provide high-speed matching, but DFAs for large signature sets often consume

gigabytes of memory. Researchers have therefore investigated techniques to improve

the space-efficiency of DFAs. These include, for example, techniques to determinize

on-the-fly [80]; MDFAs, which combine signatures into multiple DFAs (as discussed

in Section 3.4) [102]; D2FAs [46], which reduce the memory footprint of DFAs via

edge compression; and XFAs [76, 77], which extend DFAs with scratch memory to

store auxiliary variables, such as bitmaps and counters, and associate transitions with

instructions to manipulate these variables. Some DFA variants (e.g., [46, 77, 18, 52])

also admit efficient hardware implementations.

These techniques use the time-efficiency of DFAs as a starting point, and seek to

reduce their memory footprint. In contrast, our work uses the space-efficiency of NFAs

as a starting point, and seeks to improve their time-efficiency. We believe that both

approaches are orthogonal and may be synergistic. For example, it may be possible to

use OBDDs to also improve the time-efficiency of MDFAs.

Our approach also provides advantages over several prior DFA-based techniques.

First, it produces NFA-OBDDs from regular expressions in a fully automated way.

This is in contrast to XFAs [76], which required a manual step of annotating regular

expressions. Second, our approach does not modify the semantics of regular expressions,

i.e., the NFA-OBDDs produced using the approach described in Section 3.2 accept the

54

same set of strings as the regular expressions that they were constructed from. MDFAs,

in contrast, employ heuristics that relax the semantics of regular expressions to improve

the space-efficiency of the resulting automata [102]. Last, because these techniques

operate with DFAs, they may sometimes encounter regular expressions that are hard

to determinize. For example, Smith et al. [76, Section 6.2] present a regular expression

from the Snort data set for which the XFA construction algorithm runs out of memory.

In contrast, our technique operates with NFAs and therefore does not encounter such

cases.

Research on NFAs for intrusion detection has typically focused on exploiting par-

allelism to improve performance [39, 54, 25, 71]. NFA operation can be parallelized

in many ways. For example, a separate thread could be used to simulate each state

in an NFA’s frontier. Else, a set of regular expressions can be represented as a col-

lection of NFAs, which can then be operated in parallel. FPGAs have been used to

exploit this parallelism to yield high-performance NFA-based intrusion detection sys-

tems [39, 54, 25, 71].

Although not explored in this chapter, OBDDs can potentially improve NFA per-

formance in parallel execution environments as well. For example, consider a NIDS

that performs signature matching by operating a collection of NFAs in parallel. The

performance of this NIDS can be improved by converting it to use a collection of NFA-

OBDDs instead; in this case, OBDDs improve the performance of each NFA, thereby

increasing the throughput of the NIDS as a whole. Finally, NFA-OBDDs may also

admit a hardware implementation. Prior work has developed techniques to implement

OBDDs in CAMs [104] and FPGAs [74]. Such an implementation of NFA-OBDDs can

be used to improve the performance of hardware-based NFAs as well.

3.7 Summary

Many recent algorithms for regular expression matching have focused on improving

the space-efficiency of DFAs. This chapter sought to take an alternative viewpoint,

55

and aimed to improve the time-efficiency of NFAs. To that end, we developed NFA-

OBDDs, a representation of regular expressions in which OBDDs are used to operate

NFAs. Our prototype software-based implementation with Snort signatures showed

that NFA-OBDDs can outperform NFAs by almost three orders of magnitude. We also

showed how OBDDs can enhance the performance of NFAs that match multiple input

symbols in a single step.

In summary, the main contribution of this chapter is in showing that the use of

OBDDs drastically improves NFA performance and brings them within the realm of

feasible use in intrusion detection systems. In the light of this contribution and the

space-efficiency of NFAs, we conclude with a call for further research on the use of

NFAs to represent signatures.

56

Chapter 4

Fast Submatch Extraction using OBDDs

4.1 Introduction

Pattern languages commonly used by NIDS are regular expressions extended with

other features. One of the important features is the capturing group. A captur-

ing group is a syntax used in modern regular expression implementations to spec-

ify a subexpression of a regular expression. Given a string that matches the reg-

ular expression, submatch extraction is the process of extracting the substrings cor-

responding to those subexpressions. In Snort 2012 rule set [78], more than 10% of

pcre fields of the HTTP rules contain capturing groups. When a pattern contain-

ing a capturing group matches an input string, the submatch construct can iden-

tify parts of the input that are of interest to security administrators for analysis.

For a regular expression like “username=(.*),hostname=(.*)” with an input string

“username=Bob,hostname=Foo”, submatch construct can extract the two substrings

“Bob” and “Foo” specified by the two capturing groups (the subexpressions wrapped

by the two pairs of parentheses).

An important network security application that makes extensive use of submatch

extraction is Security Information and Event Management (SIEM) [94], which provides

real-time analysis of security alerts generated by network hardware and applications.

SIEM systems often collect data from a variety of hardware and software sensors, and

must therefore normalize this data into a common format by extracting common fields

from various data sources. SIEM systems use submatch extraction during data nor-

malization and alert reporting. In a typical SIEM system, more than 90% of regular

expressions used for data normalization contain capturing groups.

In both SIEM systems and NIDS, scalability of pattern matching and submatch

57

extraction is key. NIDS are often deployed over high-speed network links, which require

algorithms for pattern matching and submatch extraction be efficient enough to provide

high throughput intrusion detection on large volume of network traffic. Similarly, a

typical SIEM system collects logs from hundred of devices and applications, and must

process terabytes of logs every day in enterprise networks.

There is plenty of prior work on making pattern matching for regular expressions

time-efficient [12, 97, 22, 25, 52, 18, 46, 39, 71, 50] and space-efficient [102, 13, 12, 80, 76,

77, 53, 60]. However, most of these works only considered regular expressions containing

no capturing groups, i.e., they did not support submatch extraction. Existing solutions

for submatch extraction are based on non-deterministic finite automata (NFAs) [47,

29] or recursive backtracking [62]. While NFAs are space-efficient and can extract

submatches with a compact memory footprint, they are not time-efficient because they

maintain a frontier, i.e., a set of states in which a NFA can be at any instant, that

can contain O(n) states where n is the NFA’s number of states. This leads to an

O(n) operation time for the NFA for each input symbol. Google’s RE2 package [29]

uses a combination of DFAs and NFAs to improve the time efficiency of submatch

extraction [29]. RE2 constructs DFAs on demand (determination on the fly) and uses

DFAs to locate a pattern’s overall match location in an input string and then uses a

NFA-based method to extract submatches. The time-efficiency of DFAs, however, often

comes with a cost of state blow-up. RE2 can be very slow when the DFA construction

fills up the limited state cache; it has to empty the state cache and restart the DFA

construction process. Moreover, the actual submatch extraction of RE2 is performed

using a NFA-based method, which is space-efficient, but not time-efficient. Tools such

as PCRE and the regex libraries in Java, Perl, and Python use recursive backtracking

for regular expression matching. The execution time of backtracking, however, can be

exponential for certain types of regular expressions [28]; NIDS that employ backtracking

suffer from algorithmic complexity attacks [75].

In this chapter, we present a novel approach to perform submatch extraction for reg-

ular expression-like pattern languages. Our approach is an extension of the NFA-OBDD

model [97, 99] described in Chapter 3. While both works employ the ordered binary

58

decision diagram (OBDD) data structure, the NFA-OBDD approach did not consider

the submatch construct, making it inapplicable to the 90% of regular expressions in a

typical SIEM system. We extend the NFA-OBDD model in two ways: (1) we propose

an approach to annotate capturing groups in regular expressions, and (2) present a

new approach to perform submatch extraction. To demonstrate the feasibility of our

approach, we evaluated our approach using patterns extracted from the Snort NIDS [5]

and a commercial SIEM product. Our experiments show that our approach achieves

its ideal performance when patterns are combined. In the best case, our approach is

faster than RE2 and PCRE by one to two orders of magnitude. In particular, we make

the following contributions:

• We propose a new approach to tag capturing groups in a regular expression,

and extend Thompson’s NFA construction approach [38] to convert a regular

expression with capturing groups to a tagged-NFA.

• We present a novel and time-efficient technique (henceforth called Submatch-

OBDD) to perform submatch extraction for regular expression-like pattern lan-

guages.

• We evaluated our approach’s time efficiency and space efficiency by matching

the patterns from the Snort system [5] and a commercial SIEM system with

network traces, synthetic traces, and enterprise event logs, and then compared

our performance with two popular regular expression engines: RE2 and PCRE.

The remainder of the chapter is organized as follows. We present our design and

implementation of Submatch-OBDD in Section 4.2, followed by our experimental eval-

uation in Section 4.3. We discuss related work in Section 4.4 and summarize Submatch-

OBDD in Section 4.5.

4.2 Design and Implementation

We first give an overview of our approach before describing the technical details.

59

4.2.1 Solution Overview

A key observation underlying our approach is that adding a capturing group to a

regular expression does not change the language defined by the regular expression. It

is known that every language defined by a regular expression is also defined by a finite

automaton [38]. However, traditional automata do not support capturing groups. We

present an approach to annotate capturing groups in regular expressions and extend

Thompson’s approach to convert a regular expression with capturing groups to a NFA-

like machine where transitions within capturing groups are tagged. We then present a

novel approach to do submatch extraction using the tagged-NFAs. To improve the time

efficiency of submatch extraction, we represent tagged-NFAs with symbolic Boolean

functions, and manipulate the Boolean functions using ordered binary decision diagrams

(OBDDs).

4.2.2 Tagging NFAs for Submatch

The syntax of regular expressions with capturing groups on an alphabet Σ is

E ::= ε ∪ a ∪ EE ∪ E|E ∪ E ∗ ∪ (E) ∪ [E]

where a stands for an element of Σ, and ε denotes for zero occurrence of a symbol. We

use square brackets [, ] to group terms in a regular expression that are not capturing

groups, because the usual parentheses (, ) are reserved for marking capturing groups.

If X and Y are sets of strings we use XY to denote {xy : x ∈ X, y ∈ Y }, and X|Y to

denote X ∪ Y . We use E∗ to denote the closure of E under concatenation.

We use tags to distinguish the capturing groups within a regular expression. Given a

regular expression containing c capturing groups, we assign tags t1, t2, ..., tc to each cap-

turing group in the order of their left parentheses as E is read from left to right. We de-

note the set of tags by T = {t1, t2, ..., tc}. We use tag(E) to refer to the resulting tagged

regular expression. For example, if E = ((a∗)|b)(ab|b) then tag(E) = ((a∗)t2 |b)t1(ab|b)t3 .

The language L(F ) for a tagged regular expression F = tag(E) is a set of tagged

strings, defined by L(ε) = {ε}, L(a) = {a}, L(F1F2) = L(F1) · L(F2), L(F1|F2) =

L(F1)∪L(F2), L(F∗) = L(F )∗, L([F ]) = L(F ), and L((F )t) = {αt : α ∈ L(F )}, where

60

()t denotes a capturing group with tag t and αt denotes the string α tagged with t. A

string α is tagged by t, if and only if each character in α is tagged by t. Substrings of α

may be tagged by other tags. Since capturing groups can be nested, a character can be

tagged by multiple tags. An example of tagged string for a tagged regular expression

is: abt1bt1bt1 ∈ L(a(b∗)t1).

Definition A valid assignment of submatches for a string α that matches regular ex-

pression E is a map sub : {t1, t2, . . . tc} → Σ∗ such that there exists β ∈ L(tag(E))

satisfying the following:

(i) β|Σ = α, where β|Σ represents the projection of characters in β onto their

corresponding values of Σ;

(ii) if ti occurs in β then sub(ti) is the last consecutive sequence of characters that

are assigned with tag ti;

(iii) if ti does not occur in β, then sub(ti) = null;

For example, consider the regular expression [(a|c)(b|d)]∗ with input string abcd. A

valid submatch assignment satisfying the above conditions is sub(t1) = c, sub(t2) = d.

It is well known that a regular expression can be converted to an ε-NFA that defines

the same language using Thompson’s approach [38]. An ε-NFA can be reduced to an ε-

free NFA through an ε-closure mechanism [38]. In this chapter, we extend Thompson’s

algorithm in a way such that it can convert a regular expression containing capturing

groups to a tagged ε-NFA defining the same language. A tagged ε-NFA can be described

by a 7-tuple A = (Q,Σ, T, δ, γ, S, F ), where Q is a finite set of states, Σ is a finite set

of input symbols, T is a finite set of tags that each represents a capturing group, S is

a set of start states, F is a set of accept states, δ is the transition function, and γ is a

tag output function γ : Q × Σ × Q → 2T , which associates each transition with a tag

set (that can be empty).

A tagged NFA can be constructed as follows: starting from the three base cases

shown in Figure 4.1. Figure 4.1(a) is the NFA of expression ε, Figure 4.1(b) handles

61

ɛ

(a) (b) (c)

Figure 4.1: Constructing tagged NFAs for (a) NFA of ε; (b) NFA of an empty regularexpression; (c) NFA of a symbol a wrapped by capturing groups denoted by τ .

R

S

R S

R

(a)

(b)

(c)

Figure 4.2: The (a) union R|S, (b) concatenation RS, and (c) closure constructs R∗ oftagged NFA construction from a regular expression.

the empty regular expression, and Figure 4.1(c) gives the NFA of a single symbol

a with a set of tags τ ∈ 2T corresponding to capturing groups associated with the

illustrated transition. More complex tagged NFAs can be constructed using the union,

concatenation, and closure constructs, by combining smaller tagged NFAs as shown in

Figure 4.2. A tagged NFA constructed using the above approach contains ε transitions.

Such a tagged NFA can be converted to an ε-free NFA in a manner akin to the standard

ε-closure algorithm for standard NFAs. We denote the corresponding ε-free tagged NFA

as A1 = (Q1,Σ, T, δ1, γ1, S1, F1), where the components of A1 are defined in a manner

akin to A (the tagged ε-NFA).

Example Consider an example regular expression “(a*)aa”. Figure 4.3 shows an

the tagged ε-NFA, where the capturing group is tagged by t1. Figure 4.4 shows the

corresponding ε-free tagged-NFA.

62

1 2 3 4 5 6 7 8 a/t1

a a

Figure 4.3: The tagged ε-NFA of “(a*)aa”, where the transition associated with theleftmost character a is tagged by t1 because “a*” is within a capturing group.

1 2

a/t1

a a

3

Figure 4.4: The tagged ε-free NFA of “(a*)aa” after ε-elimination, where state numbers1, 2, and 3 are obtained by renaming and merging states 2, 5, 7, and 8 in the ε-NFAduring ε-closure calculation.

4.2.3 Operations on Tagged NFAs

The transition function δ1 and tag output function γ1 of a tagged ε-free NFA can be

represented by a four-column table denoted by ∆(x, i, y, τ), which is a set of quadruples

(x, i, y, τ) such that there is a transition labeled by input symbol i from state x to state

y with a set of output tags τ . Table 4.1 shows the tagged transition table of the example

NFA in Figure 4.4, where each tagged transition is represented by a row in the table.

∆(x, i, y, τ) allows us to perform two key operations on tagged NFAs — match test and

submatch extraction, where the match test checks whether an input string is accepted

by a tagged NFA; if so, the submatch extraction procedure returns a valid assignment

of submatches of the input string.

x i y τ

1 a 1 {t1}1 a 2 φ2 a 3 φ

Table 4.1: Transition table of the tagged NFA in Figure 4.4.

63

Frontier {1} {1, 2} {1, 2, 3} {1, 2, 3} {1, 2, 3}

Input a a a a

{t1} {t1} {t1} {t1} 1

2

3

Figure 4.5: Example of frontier derivation for the tagged NFA in Figure 4.4 with inputstring “aaaa”. The dark circles of each column stands for the frontier states afterconsuming an input symbol. A light gray circle means that a state is not in a frontierstate set. An arrow between two circles represent a transition. An arrow is labeled bya submatch tag if the denoted transition is within a capturing group.

Match Test

Testing whether an input string matches a regular expression with capturing groups can

be done by operating its tagged NFA. The process is similar to operating a traditional

NFA, except that we need to do bookkeeping to be used for submatch extraction. The

match test of a tagged NFA for a given input string a1a2 . . . al ∈ Σ∗ is performed by

consuming one input symbol at a time, and modifying the frontier of active states

appropriately using the transition function δ1. As we modify the frontier, we also

record the transitions that the tagged-NFA makes by recording quadruples that store

the states traversed by each transition, as well as the tags corresponding to those

transitions. We denote these sets of transitions using ∆1, ∆2, and ∆l, where each ∆i

is a set of quadruples of the form (x, i, y, τ) corresponding to a source state, an input

symbol, a target state, and the corresponding tag.

After the last input symbol al is consumed, we check whether any state in the

frontier set belongs to accept states F1. If so, the input string a1a2 . . . al is accepted by

the tagged NFA A1, i.e., the input string matches the regular expression defined by A1.

Example Consider the example regular expression “(a*)aa” in Figure 4.4, where its

64

tagged transitions ∆(x, i, y, τ) are shown in Table 4.1. For convenience, we denote the

three quadruples in Table 4.1 by row1, row2, and row3. Let us use “aaaa” as an input

string. For the ith input symbol, we use Xi to denote the current frontier set and Yi

to denote the next frontier set after the symbol is consumed. Start from the first input

symbol a and start states S1 = {1}, we have ∆1 = {row1, row2}, Y1 = {1, 2}. Rename

Y1 to X2 and follow the process described in the frontier derivation, we can obtain

∆2 = {row1, row2, row3}, X3 = {1, 2, 3}, ∆3 = {row1, row2, row3}, X4 = {1, 2, 3}, and

∆4 = {row1, row2, row3}, X5 = {1, 2, 3}. Figure 4.5 visualizes how the frontier set

evolves after consuming each input symbol during the match test. An arrow between

two nodes denotes a transition. If an arrow is tagged, it means that a transition is

associated with one or more submatch tags, e.g., the {t1} above the arrow between

states 1 and 1 indicate this transition is within a capturing group.

Submatch Extraction

If an input string is accepted by a regular expression that has capturing groups, the

submatches of the input string need to be extracted. Recall that the NFA match

test process described above actually considers all possible branches (transitions) when

consuming each input symbol. If the input string is accepted by the tagged NFA, then

there exists at least one path from a start state to an accept state, where edges of

the path denote transitions between states and are sequentially associated with the

individual symbols of the input string. An edge may be associated with one or more

submatch tags, or no tag at all. For example, the bold arrows in Figure 4.5 shows a path

from start state 1 to the accept state 3. Such a path allows us to perform submatch

extraction.

In fact, any path from a start state to an accept state during a match test on a tagged

NFA generates a valid assignment of submatches. A review of the match test process

can help us to understand why: Since a path from a start state to an accept state is

a number of sequential and valid operations of a tagged NFA on an input string, the

assignment of submatch tags on each input symbol is also valid. The collections of the

last consecutive sequences of symbols associated with the same tags that satisfy the

65

conditions of the definition in Section 4.2.2 generate a valid assignment of submatches.

Path Finding Assume an input string a1a2 . . . al is accepted by a tagged NFA A1,

and qf is an accept state after consuming the last symbol al. We present a backward

traversal approach to find a path that allows for submatch extraction. Starting from

one of the accept states qf , with the last input symbol al, perform a lookup on ∆l for

quadruples (x, i, y, τ) such that y = qf and i = al. Pick any quadruple (ql, al, qf , τ) that

satisfies this condition, then ql is a previous state that leads the automaton to qf with

the last input symbol al, and τ is the corresponding submatch tags associated with al.

We note that τ can be empty. Using ql, with input symbol al−1, perform a lookup on

∆l−1 for quadruples (x, i, y, τ) such that y = ql and i = al−1. Such quadruples will

allow us to find a previous state of ql with input symbol al−1, along with submatch

tags associated with al−1 if there are any. Continue this process for al−2, . . . , and a1.

Finally, we will reach a start state q1. Then q1, q2, . . . , ql, qf is a valid traversal path for

input string a1a2 . . . al. During the backward path finding, each symbol in a1a2 . . . al is

assigned with zero or a set of submatch tags. Submatches of an accepted input string can

be extracted by scanning the input strings and collecting the last consecutive sequence

of symbols associated with the same submatch tags. Given a regular expression and a

matching string, there might exist multiple paths from a start state to an accept state.

Thus, there might exist multiple ways to assign valid submatches.

Example Figure 4.5 shows a traversal path of input string “aaaa” on the tagged NFA

shown in Figure 4.4. The path is marked by bold arrows. Along this path, we can

see that the first two symbols of “aaaa” are associated with tag t1, and the last two

symbols have no submatch tag. Thus, the submatch of “aaaa” for regular expression

“(a*)aa” is the substring of the first two symbols, i.e., “aa”.

The match test and submatch extraction algorithms described in Section 4.2.3 are

space efficient since the construction is based on NFA. However, they are not time

efficient. During the match test, the number of states in a frontier is O(|Q1|) (the size

of NFA). To derive the next frontier, states in the current frontier need to be processed

one by one. Thus, the number of lookups at the transition table during the match test

66

for an input string of length l is O(|Q1| × l). Similarly, the number of table lookups

performed during submatch extraction can be estimated as O(|Q1| × l). If we can find

an approach that allows us to derive frontiers (in match test) and previous states (in

submatch extraction) more efficiently, then the time efficiency of the algorithms can be

improved. Fortunately, we already have the data structures to do that. Our approach

is to represent tagged NFAs, perform match testing and submatch extraction using

Boolean functions (Section 4.2.4), and manipulate the Boolean functions using ordered

binary decision diagrams (OBDDs) (Section 4.2.5).

4.2.4 Boolean Function Representation

For convenience, we discuss tagged NFAs in which ε transitions have been eliminated.

The Boolean function of a tagged NFA A1 = (Q1,Σ, T, δ1, γ1, S1, F1) uses four vectors

of Boolean variables, x,y, i, and t. Vectors x and y are used to denote states in Q1,

and they contain dlg |Q1|e Boolean variables each. Vector i is used to denote symbols in

Σ and it contains dlg |Σ|e Boolean variables. Vector t is used to denote submatch tags

and it contains d|T |e Boolean variables. We construct the following Boolean functions

for the tagged NFA A1.

• ∆(x, i,y, t) denotes the tagged transition table of A1. It is a disjunction of all

tagged transition relations (x, i, y, t). As an example, the Boolean encoding of

transition relations in Table 4.1 is shown in Table 4.2, where states are encoded

by two bits, input symbol is encoded by one bit (since there is only one symbol

‘a’), and submatch tags are encoded by one bit. Specifically, states 1, 2, and 3

are encoded as 01, 10, and 11; symbol ‘a’ is encoded as 1; and submatch tag t1

is encoded as 1. The fifth column of Table 4.2 lists the function values for each

set of Boolean encodings. The function value of Boolean encodings for tagged

transitions is 1. The Boolean encoding in Table 4.2 can be symbolically translated

67

x i y t ∆(x,i,y,t)

0 1 1 0 1 1 10 1 1 1 0 0 11 0 1 1 1 0 1

Table 4.2: Boolean encoding of transitions in Table 4.1.

to

∆(x, i,y, t) = (x1 ∧ x2 ∧ i ∧ y1 ∧ y2 ∧ t1)

∨ (x1 ∧ x2 ∧ i ∧ y1 ∧ y2 ∧ t1)

∨ (x1 ∧ x2 ∧ i ∧ y1 ∧ y2 ∧ t1)

If we rename variables i, y1, y2, and t1 to x3, x4, x5, and x6

respectively,∆(x, i,y, t) can be represented by the OBDD shown in Figure 4.6 .

• Iσ(i) stands for the Boolean representation of symbols in Σ. As an example,

symbol ‘a’ in Table 4.1 can be symbolically represented by Ia(i) = i.

• F(x) is a Boolean function representing frontier states. In the tagged NFA shown

in Figure 4.4, consider state {1} with input symbol ‘a’, the new frontier has two

states {1, 2}, which can be symbolically represented by F(x) = (x1∧x2)∨(x1∧x2).

• ∆F (x,i,y,t) is used to represent the intermediate transitions for frontier F(x)

during a match test process.

• A(x) is used to define the Boolean representation of accept states of a tagged

NFA. For the tagged NFA shown in Figure 4.4, the accept states is {3}, thus,

A(x) = x1 ∧ x2.

The Boolean functions described above can be automatically computed for any

tagged NFA. We next describe how to perform the match test and submatch extraction

described in Section 4.2.3 using these Boolean functions.

Match Test

The match test process is similar to that described in Chapter 3, except that we do

book-keeping here to be used for submatch extraction. Suppose the frontier of a tagged

68

x1

x2

x2

x3

x3

x4

x4

x5

x5

x5

x6

x6

Figure 4.6: The ordered binary decision diagram of a Boolean functionf(x1, x2, x3, x4, x5, x6) = (x1 ∧ x2 ∧ x3 ∧ x4 ∧ x5 ∧ x6) ∨ (x1 ∧ x2 ∧ x3 ∧ x4 ∧ x5 ∧x6) ∨ (x1 ∧ x2 ∧ x3 ∧ x4 ∧ x5 ∧ x6) with ordering x1 ≺ x2 ≺ x3 ≺ x4 ≺ x5 ≺ x6.

NFA is F(x) at some instant of frontier derivation, and the next input symbol is σ,

then the next frontier states can be computed using the following Boolean operations:

G(y) = ∃ x· ∃ i· ∃ t· [∆F (x, i,y, t)] (4.1)

where

∆F (x, i,y, t) = F(x) ∧ Iσ(i) ∧∆(x, i,y, t) (4.2)

We now explain why Equation (4.1) produces the new frontier states. Recall that

∆(x, i,y, t) is the disjunction of the tagged transitions of a NFA. The conjunctions of

∆(x, i,y, t) with F(x) and Iσ(i) on the right side of Equation (5.1) actually selects rows

in the truth table of ∆(x, i,y, t) that correspond to outgoing transitions from the states

in the current frontier F(x) labeled with symbol σ. These transitions are denoted by

∆F (x, i,y, t), which is a function of x, i,y, and t. The new frontier states are the target

states of the selected transitions and are only associated with y. To extract the new

frontier states, we existentially quantify x, i, and t using the existential quantification

69

operator introduced in Chapter 2. We rename y to x to express the new frontier states

in terms of x.

Consider the tagged NFA in Figure 4.4. Suppose the current frontier is {1} and the

next input symbol is ‘a’. Then

F(x) ∧ Ia(i) ∧∆(x, i,y, t) = (x1 ∧ x2 ∧ i ∧ y1 ∧ y2 ∧ t1)

∨ (x1 ∧ x2 ∧ i ∧ y1 ∧ y2 ∧ t1)

Apply existential quantification of x, i, and t on the above conjunctions we obtain

(y1 ∧ y2)∨ (∧y1 ∧ y2), which is the symbolic Boolean representation of the new frontier

states {1, 2}.

To check whether the automaton is in an accept state, simply check the satisfiability

of the conjunction between F(x) and A(x). Rename the above example frontier (y1 ∧

y2) ∨ (∧y1 ∧ y2) to (x1 ∧ x2) ∨ (∧x1 ∧ x2) and do a conjunction with A(x) = x1 ∧ x2.

The result is not satisfiable, thus, the automaton is not in an accept state.

Submatch Extraction

Now, we discuss how to extract submatches using Boolean function operations. The

process starts from the last symbol and one of the states where the input string is

accepted. For convenience, we call the current state of a backward path finding a

reverse frontier, which contains only one state because we are only interested in finding

one path. Suppose at an instant of the path finding the reverse frontier representation

is Fr(y), and the previous input symbol is σ. A previous state that leads the automaton

to Fr(y) can be derived from the following Boolean function:

∆r(x, i,y, t) = Fr(y) ∧ Iσ(i) ∧∆F (x, i,y, t) (4.3)

where ∆F (x, i,y, t) denotes the intermediate tagged transitions corresponding to sym-

bol σ during the match test process. The conjunctions on the right side of Equation (4.3)

selects tagged transitions (labeled by σ) from ∆F (x, i,y, t), where the target state is

Fr(y). The previous states are associated with x in ∆r(x, i,y, t). Since we are only

70

interested in one path, we simply pick one row in the truth table of ∆r(x, i,y, t) to find

one previous state of Fr(y). If we denote the picked row as PickOne(∆r(x, i,y, t)), a

previous state G(x) of Fr(y) can be derived by

G(x) = ∃ y· ∃ i· ∃ t·H(x, i,y, t) (4.4)

H(x, i,y, t) = PickOne(∆r(x, i,y, t)) (4.5)

To obtain submatch tags τ(t) associated with σ, we existentially quantify x, i, and y

on H(x, i,y, t).

τ(t) = ∃ x· ∃ i· ∃ y·H(x, i,y, t) (4.6)

Consider the example in Figure 4.5. After consuming the fourth input symbol of

“aaaa”, the automaton accepts and

∆F (x, i,y, t) = (x1 ∧ x2 ∧ i ∧ y1 ∧ y2 ∧ t1)

∨ (x1 ∧ x2 ∧ i ∧ y1 ∧ y2 ∧ t1)

∨ (x1 ∧ x2 ∧ i ∧ y1 ∧ y2 ∧ t1)

Starting from the accept state 3 (Fr(y) = y1∧y2) and the last symbol ‘a’ (Ia(i) = i),

do a conjunction according to Equation (4.3) we get ∆r(x, i,y, t) = (x1∧x2∧i∧y1∧y2∧

t1), which has only one tagged transition. Perform existential quantifications according

to Equation (4.4) and (4.5) we obtain the Boolean representation of a previous state

as x1 ∧ x2, which translates to state 2. Do an existential quantifications according to

Equation (4.6) we get τ(t) = t1, which means that no tag is associated with the fourth

symbol ‘a’. Applying the same approach on the 3rd, 2nd, and 1st symbols, we obtain

a path from state 1 to 3, where the 1st and 2nd symbol ‘a’ are assigned with submatch

tag t1. Thus, the submatch of “aaaa” to “(a*)aa” is sub(t1) =“aa”.

A submatch assignment obtained by our approach is not necessarily the left most,

longest submatch, which is required by POSIX. However, POSIX does not have a notion

of “greedy” and “reluctant” closures, which give some control over the length of the

submatch. Thus, POSIX is incomplete. Standard libraries like Java and PCRE have

behaviors that are not POSIX compliant.

71

4.2.5 Submatch-OBDD

To improve the efficiency of the match test and submatch extraction, we rep-

resent and manipulate the Boolean functions defined in Section 4.2.4 using OB-

DDs. We call our model Submatch-OBDD. A Submatch-OBDD for a tagged NFA

A1 = (Q1,Σ, T, δ1, γ1, S1, F1) is a 5-tuple [OBDD(∆(x, i,y, t)), {OBDD(Iσ|∀σ ∈

Σ))}, {OBDD(Tt|∀t ∈ T )}, OBDD(FS1), OBDD(A)], where ∆(x, i,y, t) is Boolean

representation of tagged transitions, Iσ is the Boolean representation of a symbol σ ∈ Σ,

Tt is Boolean representation of a tag t ∈ T , FS1 is Boolean representation of start states,

and A is the Boolean representation of accept states F1.

To understand why OBDDs can improve the time-efficiency of tagged NFA oper-

ations, consider frontier derivation on a tagged NFA. To derive a new set of frontier

states, the tagged transition table must be retrieved for each state in the current fron-

tier F , leading to O(|F|) operations for each input symbol. On the other hand, the

time-complexity of using OBDDs to derive the next frontier is determined by the two

conjunctions and one existential quantification in Equation (4.1) and (5.1). When

the frontier set F is large, the cost of doing the two conjunctions and one existential

quantification is often smaller than doing |F| lookups on the transition table. Using

the same method, we can calculate that the time complexity of submatch extraction is

the same as the match test process. For a tagged-NFA with n states, the size of frontier

set |F| is O(n). Thus, the cost to process an input string of l bytes by our approach is

between O(l) and O(nl). In other words, the time complexity of Submatch-OBDD is

between a pure DFA and a pure NFA approach.

The space efficiency of Submatch-OBDD is comparable to tagged NFAs. The space

cost of a Submatch-OBDD is dominated by OBDD(∆(x, i,y, t)), which needs a total

of 2 × dlg |Q1|e + dlg |Σ|e + d|T |e Boolean variables. In the worst case, the size of the

OBDD is O(|Q1|2×|Σ|×2|T |), which is comparable to the size of transitions of a tagged

NFA. We note that the OBDDs of intermediate transitions ∆F (x, i,y, t) for all input

symbols also take some space, mainly depending on the size of input string. We will

show that such a cost is not a concern in practice in Section 4.3.

72

4.2.6 Implementation

We implemented Submatch-OBDD as a toolchain in C++. The toolchain has two of-

fline components, Re2Tnfa and Tnfa2Obdd, and one online component, Pattern-

Match. Re2Tnfa accepts patterns as input and outputs tagged-NFAs that defines the

same languages as the input patterns. Tnfa2Obdd generates the tagged-NFAs’ OBDD

representations. PatternMatch performs match test and submatch extraction on an

input stream using the OBDD representations. Our implementation interfaces with the

popular CUDD library [79] for OBDD construction and manipulation.

In comparison, both PCRE and RE2 are implemented in C++. PCRE uses a recursive

backtracking approach. RE2 uses a combination of DFAs and NFAs for submatch

extraction: Given a pattern and an input string, RE2 constructs and uses backward

and forward DFAs to locate the pattern’s overall match in the input string. It then uses

NFA based approaches to find submatches in the overall match. For memory efficiency,

RE2 does not construct entire DFAs. It creates DFA states on demand (determination

on-the-fly) and stores them in a limited sized cache; when the cache gets full, RE2

empties the cache and restarts the DFA construction process.

4.3 Evaluation

We evaluated the performance of our Submatch-OBDD implementation using patterns

used in real systems. We measured Submatch-OBDD’s time efficiency and space effi-

ciency by matching the patterns with network traces, synthetic traces, and enterprise

event logs, and then compared our performance with two popular regular expression

engines: RE2 and PCRE. Our findings suggest that Submatch-OBDD achieves its ideal

performance when patterns are combined. In the best case, Submatch-OBDD is faster

than RE2 and PCRE by one to two orders of magnitude. All the performance num-

bers of Submatch-OBDD reported in this section were obtained based on the variable

ordering of i ≺ x ≺ y ≺ t.

73

4.3.1 Data Sets

We used the following three sets of patterns and trace files to evaluate the performance

of our approach:

Snort-2009

We extracted 115 patterns from a Snort 2009 HTTP rule set of 3078 patterns. All

patterns were extracted from the pcre fields of the rules. Since our focus is submatch

extraction, we excluded patterns containing no capturing groups and patterns contain-

ing back references as patterns with back references cannot be represented by regular

languages. Each extracted pattern contains one to six capturing groups.

We used two network traces and one synthetic trace to evaluate the performance of

our approach on the Snort-2009 pattern set.

• The first web trace was a 1.2GB network traffic collected using tcpdump from our

department’s web server. The average packet size of this trace is 126 bytes with

a standard deviation of 271. The second web trace was a 1.3GB network traffic

collected by crawling URLs that appeared on Twitter using a python script and

recording the full length packets using tcpdump. The average packet size of the

second trace is 1202 bytes with a standard deviation of 472.

• We also created a synthetic trace to observe how different implementations per-

form under the backtracking algorithmic complexity attack [75]. By reviewing

the 115 patterns of the Snort-2009 pattern set, we found that several of them are

vulnerable to the backtracking algorithmic complexity attack if a regular expres-

sion engine is implemented by backtracking, e.g., PCRE. We then crafted a 1MB

trace that can exploit the backtracking behavior of a backtracking-based pattern

matching engine. The average line length of the trace is 311 bytes with a standard

deviation of 5.

74

Snort-2012

We also evaluated our approach with the latest rules from the Snort system. We ex-

tracted 403 patterns (regular expressions with capturing groups) from a snapshot of

the Snort-2012 HTTP rule set containing 3990 rules. All patterns were extracted from

the pcre fields of the rules. Like the patterns of Snort-2009, we excluded patterns

containing back references as they can not be represented by regular languages. Pat-

terns containing no capturing group are also excluded as our focus was on submatch

extraction. Each extracted pattern has one to ten capturing groups.

We used two web traces and one synthetic trace to evaluate the performance of

different approaches on this pattern set. The two web traces are the same as those

used in the Snort-2009 pattern set evaluation. The synthetic trace was created after

reviewing the 403 patterns: We found that several of the 403 patterns are vulnerable

to backtracking algorithmic attacks. We then crafted a 1MB trace that can exploit

the backtracking behavior of a backtracking-based pattern matching engine and evalu-

ated its effects on Submatch-OBDD, RE2, and PCRE. The average line length of this

synthetic trace is 689 bytes with a standard deviation of 41.

Firewall-504

We also obtained a set of 504 patterns used by a commercial SIEM system C to nor-

malize logs generated by a commercial firewall, F . For commercial reasons, we do not

disclose the names of the SIEM system and the firewall. Each pattern in the set has

1-22 capturing groups. We collected 87 MBs of firewalls logs generated by F in an

enterprise setting and measured our performance on the logs. The logs consist of 1.01

million lines of text and the average line size is 87 bytes with standard deviation of 51.

We did not create synthetic trace for this pattern set as firewall logs cannot easily be

controlled by an attacker.

75

4.3.2 Experimental Setup

We conducted our experiments on an Intel Core2 Duo E7500 Linux-2.6.3 machine run-

ning at 2.93 GHz with 2 GB of RAM. We measure the time efficiency of different

approaches in the average number of CPU cycles needed to process one byte of a trace

file. We only measure pattern matching and submatch extraction time, and exclude

pattern compilation time. Similarly, we measure memory efficiency in megabytes (MB)

of RAM used during pattern matching and submatch extraction.

We measure the performance of each approach on a pattern set in two configurations.

In one configuration, Conf.S, we match each pattern with the input stream sequentially.

For example, we match each pattern in the Snort-2009 set with each packet in the

network traces. Combining all patterns of a pattern set into one single pattern, however,

allows us to match each packet with all patterns in one pass. This configuration, Conf.C,

is also useful in the log normalization process of a SIEM system. The system can match

an event log with all rules in one pass and extract all fields of interest instead of matching

the logs with each rule sequentially.

Given a pattern set with n patterns and an input trace of M bytes, we measured

performance of an approach in the following two configurations.

• Conf.S (Sequential): We compile each pattern individually and then match the

compiled patterns with the trace sequentially. If the ith pattern’s execution time

for the M bytes trace is ti cycles, then the time efficiency of an approach to the

pattern set is t1+···+tnM cycles/byte.

• Conf.C (Combination): We combine the n patterns together into one pat-

tern using the Union operation. We compile the combined pattern and match

it with the input trace. If the combined pattern’s execution time for the M

bytes trace is t cycles, then an approach’s time efficiency to the pattern set is

tM cycles/byte. When an input string matches a specific pattern in the com-

bined pattern, Submatch-OBDD emits the submatches, as well as the pattern

that matches the input string.

76

4.3.3 Performance Results

Snort-2009

Table 4.3 shows the execution times (cycles/byte) and memory consumption of RE2,

PCRE, and Submatch-OBDD for the Snort-2009 pattern set on the web traces and

synthetic trace. We have the following observations:

• Submatch-OBDD achieves its ideal performance in Conf.C, i.e., when patterns

are combined together for pattern matching and submatch extraction.

• Submatch-OBDD is the fastest approach among the three. For the web traces,

Submatch-OBDD’s best performance (in Conf.C) is an order of magnitude faster

than the other approaches’ best performance (in Conf.S).

• PCRE suffers from backtracking algorithmic complexity attacks, while Submatch-

OBDD and RE2 do not. With the web traces, the best time efficiency of PCRE

was 3.67 × 104. However, PCRE was slowed down by two orders of magnitude

when the synthetic trace was used, as is shown in Table 4.3(b). The reason is

that the synthetic trace caused PCRE to perform heavily backtracking for some

patterns.

• In Conf.C, the memory consumption of Submatch-OBDD and RE2 are compar-

ative, while PCRE consumes the least memory. We do not report the memory

requirements in Conf.S as the three approaches use very little memory for simple

patterns.

We note that in Conf.S, RE2 is faster than Submatch-OBDD. This is because many

patterns did not fill up the DFA state cache and hence did not trigger the DFA re-

construction process. In the case of simple patterns, the cost of OBDD operations,

e.g., frontier derivation and existential quantification, is higher than the cost of several

lookups on NFA transition table because the frontier size is often very small. Thus,

Submatch-OBDD performs slower than RE2 in such situations. The cost of OBDD

operations will be paid off when the frontier size of a tagged-NFA is large.

77

MethodConf.S Conf.C

Exec-time Exec-time Memory (MB)

RE2 2.31× 104 1.21× 105 7.3PCRE 3.67× 104 1.13× 106 1.2OBDD 8.76× 104 3.63× 103 9.4

(a) Performance numbers with the web traces

MethodConf.S Conf.C


RE2 8.20× 104 2.22× 105 7.6PCRE 1.44× 106 1.40× 106 1.0OBDD 2.12× 105 2.20× 104 7.0

(b) Performance numbers with the synthetic trace

Table 4.3: Execution time (cycles/bytes) and memory consumption for the Snort-2009data set with (a) the web traces and (b) the synthetic trace. In both traces, Submatch-OBDD’s best execution time (Conf.C) is much shorter than RE2’s and PCRE’s bestexecution times (Conf.S).

We recommend that Submatch-OBDD to be used in cases where a group of patterns

are combined together. The performance boost of Submatch-OBDD is due to the

redundancy elimination: The OBDD representation eliminates the redundancy in the

Boolean representation of tagged NFAs.

Snort-2012

Table 4.4 shows the performance of RE2, PCRE, and Submatch-OBDD on the 403

patterns from Snort-2012 rule set.We have the following observations:

• Submatch-OBDD achieves its ideal time efficiency in Conf.C, i.e., when patterns

are combined together for matching test and submatch extraction.

• For the web traces, Submatch-OBDD is faster than RE2, but slower than PCRE.

While for the synthetic trace, Submatch-OBDD is faster than both RE2 and

PCRE.

• Like in the Snort-2009 data set, PCRE suffers from the backtracking algorith-

mic attack performed by the synthetic trace. PCRE’s time efficiency under the

synthetic trace is two to three orders of magnitude than under the web traces.

78

MethodConf.S Conf.C


RE2 4.79× 104 2.09× 106 15.0PCRE 7.70× 104 2.69× 103 1.0OBDD 3.83× 105 1.08× 104 6.3

(a) Performance numbers with the web traces

MethodConf.S Conf.C


RE2 2.92× 105 8.21× 106 15.0PCRE 1.47× 106 7.64× 105 1.0OBDD 4.70× 105 1.10× 105 15.3

(b) Performance numbers with the synthetic trace

Table 4.4: Execution time (cycles/bytes) and memory consumption for the Snort-2012data set with (a) the web traces and (b) the synthetic trace.

• In Conf.C, the memory consumption of Submatch-OBDD and RE2 are compar-

ative.

Although we observed that PCRE performed better for the web traces in Table 4.4,

PCRE is still not recommended to be used as pattern matching engine for a network

intrusion detection system (NIDS). The main reason is that it is easy for attackers

to craft network traffic performing backtracking algorithmic attacks on PCRE, as was

shown by Smith et al. in [75]. Our experimental results in Table 4.3 and Table 4.4

also demonstrated that PCRE is easily to be slowed down by hundreds of times with

carefully crafted synthetic traces.

Firewall-504

Table 4.5 shows the three approaches’ performance on the Firewall-504 data set.

Submatch-OBDD is the fastest approach on this data set. In Conf.C, Submatch-OBDD

is orders of magnitude faster than RE2 and PCRE. Also, Submatch-OBDD’s best per-

formance (in Conf.C) is 62% faster than RE2’s best performance (in Conf.S). In memory

usage, PCRE is most space compact. Submatch-OBDD consumes slightly more mem-

ory than RE2.

79

MethodConf.S Conf.C


RE2 2.04× 105 2.20× 107 21.0PCRE 6.88× 105 1.60× 106 1.1OBDD 6.31× 105 1.25× 105 30.0

Table 4.5: Execution time (cycles/bytes) and memory consumption for the Firewall-504data set.

4.3.4 Discussion

During our evaluation, we found a small number regular expressions from the Snort

2009 and 2012 rule sets that can cause either PCRE or RE2 to perform poorly. For

example, if we use PCRE to match

.*\x2F[^\s]*\.(dat|xml)\?[^\s]*v=[^\s]*t=[^\s]*c=

with input string “/;/;/;.dat?;.dat?;.dat?;v=;v=;v=;t=;t=;t=;c”. Then PCRE

will perform O(3 × 3 × 3 × 3) backtracking evaluations before eventually concluding

that the string does not match the pattern. The evaluation time of PCRE will increase

exponentially if we increase the number of repetitions of the “/;”, “.dat?”, “v=;”, and

“t=;” in the input string. We observed that when these substrings were repeated 20

times, the execution time of PCRE for this regular expression was in the order of 106

cycles/byte. Details on how to create pathological traces to exploit the backtracking

behavior of PCRE can be found in [75].

RE2 can perform poorly under the case when the DFA states of a regular expression

blow up. The blow-up will cause the limited state cache be filled quickly and RE2 has

to empty the cache and restart the DFA construction. In our experiments, we have

observed an individual regular expression from Snort-2009 where the time efficiency of

RE2 is an order of magnitude slower than Submatch-OBDD, which does not suffer from

state blow up as it is a NFA-based approach. We also found eight patterns from the

SIEM system that cause RE2 to blow up in its DFA construction. For these patterns,

the time efficiency of RE2 is an order of magnitude slower than Submatch-OBDD. For

commercial reasons, we do not disclose these patterns in the dissertation.

80

Please note that RE2 and PCRE are mature and popular engines and their code

bases are heavily optimized. We have not devoted significant time to try to optimize

Submatch-OBDD. We believe that Submatch-OBDD’s performance can be further im-

proved with better optimization.

4.4 Related Work

Regular expressions are extensively used to construct attack signatures in NIDS and

to process event logs in SIEM systems. Finite automata are natural representations

for regular expressions, and they demonstrate a time-space tradeoff in pattern match-

ing. Many techniques have been proposed to improve DFAs’ space efficiency: com-

pression [13], determinization on-the-fly [80], building multiple DFAs (MDFA) from

a group of signatures [102], extending DFAs with scratch memory (XFAs) [76, 77],

and constructing DFA variants with hardware implementations [18, 46, 52]. Similarly,

many techniques have been proposed to improve NFAs’ time efficiency: hardware based

parallelism [25, 52, 22, 39, 71] and software based speedup [97, 99]. Hybrid finite au-

tomata [12] combines the benefits of NFAs and DFAs.

Submatch extraction, however, has not received much attention from the research

community. Pike implemented a submatch extraction approach in the sam text ed-

itor [64] using a straightforward modification of Thompson’s NFA simulation [84].

Google’s RE2 tool also uses the modified NFA simulation approach. Laurikari proposed

TNFA, an NFA-based approach for submatch extraction, where an NFA is augmented

with tags to represent capturing groups [47]. Our approach also uses tags, but we asso-

ciate tags with non-ε transitions whereas TNFA associates tags with ε transitions. We

use OBDDs to represent and operate on tagged NFAs to achieve time efficiency.

PCRE and the regular expression libraries in Java, Perl, and Python implement

pattern matching and submatch extraction using recursive backtracking, where an in-

put string may be scanned multiple times before a match is found. The backtracking

approach’s worst case performance is exponential running time [28]. These tools use

backtracking to efficiently handle back reference, a non-regular construct that improves

81

the pattern language’s expressive power. In contrast, our Submatch-OBDD approach

is an NFA-based technique and does not suffer from exponential running time.

Google’s RE2 is an open source automata based pattern matching tool that sup-

ports submatch extraction [29]. RE2 employs a DFA approach to test whether an

input string matches a pattern. If a pattern contains capturing groups, RE2 uses a

DFA approach to find the pattern’s overall match in an input string and then runs an

NFA approach to extract the submatches in the overall match. Similar to RE2, our

Submatch-OBDD is NFA-based. We, however, use OBDDs to perform NFA operations

and hence improve time efficiency. Submatch-OBDD performs better than RE2 when

patterns are combined. Both RE2 and Submatch-OBDD do not support back refer-

ences. We will present an efficient matching algorithm for patterns containing back

references in Chapter 5.

The NFA-OBDD model [97, 99] is the most relevant work to Submatch-OBDD. A

commonality between NFA-OBDD and our Submatch-OBDD is the use of “implicit

state enumeration” by means of OBDDs [27, 85]. NFA-OBDD, however, does not

support submatch extraction.

4.5 Summary

In this chapter, we present Submatch-OBDD, which allows fast submatch extraction in

regular expression-like pattern matching. We propose a new approach to tag capturing

groups in a regular expression, and extend Thompson’s NFA construction approach to

support regular expressions with capturing groups. We present a novel technique to

perform submatch extraction. Our use of OBDDs improves the time efficiency of match

test and submatch extraction. We evaluated our Submatch-OBDD implementation us-

ing patterns used in the Snort NIDS and a commercial SIEM system. Our experiments

on real network traces, synthetic traces, and enterprise event logs show that Submatch-

OBDD achieves its ideal performance when patterns are combined. In the best case,

our approach is faster than RE2 and PCRE by one to two orders of mangintude.

82

Chapter 5

A New Algorithm for Patterns with Back References

5.1 Introduction

Regular language-based patterns have limited expressive power and could not be used to

describe some features appeared in network packet payload. Aside from submatch ex-

traction, back reference is another important feature provided by many pattern match-

ing tools, e.g., PCRE, the regular expression libraries of Java, Perl, and Python, etc.

Back references are used to identify repeated strings in an input string. Patterns con-

taining back references are non-regular languages [29].

To be more specific, patterns with back references are more expressive than regular

languages. For example, suppose we want to match a pair of XML tags and the text

in between. It will be hard to represent this pattern if we are only allowed to use

regular expressions (regular languages) because tags in an XML file may be unknown

beforehand. In this case, using a back reference can easily describe the pattern. For

example, “<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>” can be used to match a pair of

XML tags and the text in between, where the first capturing group (subexpression

within the pair of parentheses) is used to capture an XML tag, and the “\1” denotes

that the captured tag will be reused at the end (before the ‘>’ symbol) of the pattern.

A pattern can have multiple back references, where each of them refers to a different

capturing group. Multiple back references can be sequentially named by a ‘\’ followed

by different numbers. For example, three back references can be named as “\1”, “

\2”, and “\3”. One back reference can also appears multiple times in a pattern, e.g.,

“([a-c])x\1x\1”. Back references are also employed by modern NIDS to represent

attack signatures. For example, the HTTP rule set of Snort 2012 has 167 patterns

containing back references [78].

83

Since patterns containing back references are non-regular languages, they cannot by

represented by NFAs or DFAs. Thus, prior approaches on NFAs or DFAs could not be

applied to back references. In fact, very few work was done for patterns containing back

references. As pointed out by Cox in [28], “No one knows how to implement pattern

with back references effciently, though no one can prove that it’s impossible either”.

Specifically, back reference problem is NP-complete [6]. The de facto algorithm for

back references is recursive backtracking. However, recursive backtracking is vulnerable

to algorithmic complexity attacks. For example, the throughput of PCRE quickly

decreases to nearly zero mega-byte/second for patterns in the form of “(a?{n})a{n}\1”

(n = 5, 10, 15, 20, 25, 30) with input strings in the form of an (i.e., a is repeated n times).

In fact, PCRE fails to return correct results for n ≥ 25.

Can we find an approach that can address back references but resist known algo-

rithmic complexity attacks? In this chapter, we explore the answer to this question.

We present a novel approach to implement pattern matching with back references.

The basic idea of our approach is to convert a back reference problem to a conditional

submatch problem, and represent a conditional submatch problem using an NFA-like

machine. We evaluate the performance of our approach using both synthetic patterns

and patterns from real-world NIDS. Our experimental results show that our approach

resists known algorithmic complexity attacks and is faster than PCRE by three orders

of magnitude for certain types of patterns. For general patterns, our approach is one

order of magnitude slower than PCRE.

The remainder of this chapter is organized as follows. Section 5.2 presents the

design of our algorithm for patterns with back references. Section 5.4 presents the

experimental evaluation of our approach, and Section 5.5 discusses the related work.

Section 5.6 summarizes our contribution.

84

5.2 Design of Algorithm

The basic idea of our approach is to convert a back reference problem to a conditional

submatch problem, and represent the conditional submatch problem using an NFA-

like machine. Our approach include two phases: compilation and execution. During the

compilation phase, patterns with back references are compiled to tagged NFAs subjected

to some constraints. During the execution phase, pattern matching is performed by

operating the tagged NFAs generated at the compilation phase with input strings.

5.2.1 Pattern Compilation

We introduce a relax plus constrain approach to tackle back references. Relax refers to

re-writing a regular expression with back references to a regular expression that only

contains capturing groups. During the re-writing, a back reference part is replaced

by the capturing group that is referred by the back reference. By doing this, a back

reference and its referred capturing group becomes a pair of capturing groups in the

re-written regular expression. To make the re-written pattern be equivalent to the

original pattern, we add a constraint to the accept condition such that the submatches

returned by the capturing group pair are equal. For example, pattern “(a*)aa\1” can

be re-written as “(a*)aa(a*)” with the constraint such that “$1=$2”, where “$1” and

“$2” denote the first and second submatches captured by the two capturing groups.

Once a pattern with back reference is converted to a pattern of conditional submatch

extraction, we can construct an NFA-like machine to represent the converted pattern

using a Thompson’s like algorithm. The construction process bears some similarity to

constructing a tagged NFA described in Chapter 4. As we will see soon, the difference

lies on how to operate a tagged NFA. Recall that we add a equal substrings constraint

to a re-written pattern in order to make it equivalent to its original pattern. Thus,

we need a mechanism to maintain substrings matched by the capturing groups in a

re-written pattern. To do so, a tagged NFA needs to have a data structure allowing

for bookkeeping of captured substrings. The data structure we use is to associate

each state with a pair of substrings (multiple pair of substrings are needed if there

85

Figure 5.1: The tagged NFA constructed from “(a*)aa(a*)”.

are multiple back references). For transitions within a capturing group, we add the

corresponding input symbols into captured substrings. For transitions that are not

within any capturing group, we just carry over the captured substrings from state to

state. When a tagged NFA reaches a final state, we check whether there exists a pair

of equal captured strings. If so, an input string is matched by the tagged NFA. To

be more formal, we denote a tagged NFA as a 6-tuple (Q,Σ, T, δ, q0, F in), where Q is

the state set, Σ is an alphabet set, T is a tag set, q0 is the start state, Fin is a set of

final states, and δ is a transition function δ : Q × Σ∗ × Σ∗ × · · · → Q × Σ∗ × Σ∗ × . . .

that maps a current state with the captured substrings to a next state with updated

captured strings.

Let us demonstrate the approach using the example pattern “(a*)aa(a*)”. After

adding tags, the pattern is denoted as “(a∗)t1aa(a∗)t2”, where t1 to t2 are used to label

the two capturing groups. Figure 5.1 shows a tagged NFA constructed from pattern

“(a∗)t1aa(a∗)t2” such that “$1=$2”. It can be observed that transition from state 1 to

itself with input symbol ‘a’ is within the first capturing group, and the transition from

state 3 to itself with input symbol ‘a’ is within the second capturing group. All other

transitions are not in any capturing group.

Similar to traditional NFAs, we can use a transition table to represent the transitions

of a tagged NFA. Instead of having three columns, the transition table of a tagged NFA

has five columns, where the first three columns are same as those in a traditional

transition table, the fourth column denotes tags associated with each transition, and

the fifth column specifies the actions used for maintaining the substrings matched by

capturing groups. In our design, we have three types of actions: new, update, and carry

86

Current state(x) Input symbol(i) Next state(y) Tag(t) Action

1 a 1 t1 new(t1) or update(t1)1 a 2 φ carry over(t1)2 a 3 φ carry over(t1)3 a 3 t2 new(t2) or update(t2)

Table 5.1: Transition table of the tagged NFA in Figure 5.1.

over, where new and update actions are associated with transitions within capturing

groups, and a carry over action is associated with transitions not in any capturing group.

A new action denotes creating a new captured substring using a current input symbol,

and an update action denotes updating a substring by appending an input symbol to

the end of the substring. A carry over action denotes that captured substrings are

copied over from a current state to a next state. Table 5.1 shows the transition table

of the tagged NFA in Figure 5.1.

5.2.2 Execution

Frontier Derivation

To allow for the maintenance of captured substrings, we denote an element in a frontier

set by a tuple (x, substr1, substr2, . . . ), where x is a state number, and substr1 and

substr2 are substrings matched by the capturing groups. In general, if there are k back

references, we need a (2k+1)-tuple to represent a frontier element. During a match test

of an input string, the frontier set is initially a singleton set {(q0, “”, “”, ...)} (where

“” denotes that no substring has been captured yet) but may include multiple elements

during the operation of a tagged NFA. For each symbol in the input string, we must

process all elements in a frontier set and find a new set of elements by applying the

transition functions represented by the transition table. Applying a transition function

to a frontier element (s, substr1, substr2, . . . ) and an input symbol includes two steps.

The first step is a table lookup, i.e., given a state s and symbol I(i), retrieve all states

that are reachable from s with symbol I(i). The second step is to apply one or more

actions on the captured substrings associated with state s. In particular, if a transition

is a start of a capturing group, then a new action is applied; if a transition is within

87

a capturing group then an update action is applied; and if a transition is not within

any capturing group, then a carry over action is applied by just copying around the

captured substrings (if there is any) from a current state to a next state. For a pattern

with one back reference, the above frontier derivation process can be expressed by the

following Boolean formula:

G(y, s) = F0(∃ x· ∃ i· ∃ t· (t = φ ∧∆F (x, s, i,y, t)))

∨ F1(∃ x· ∃ i· ∃ t· (t = t1 ∧∆F (x, s, i,y, t)))

∨ F2(∃ x· ∃ i· ∃ t· (t = t2 ∧∆F (x, s, i,y, t)))

where

∆F (x, s, i,y, t) = F(x, s) ∧ Iσ(i) ∧∆(x, i,y, t) (5.1)

F(x, s) denotes the current frontier set (s denotes captured substrings), Iσ(i) de-

notes an input symbol, and ∆(x, i,y, t) denotes the transition relations of the tagged

NFA. The conjunctions in Equation 5.1 basically selects rows in the transition table

∆(x, i,y, t) that corresponding to outgoing transitions from the states in the current

frontier set F(x, s) labeled with symbol σ. The t = φ∧∆F (x, s, i,y, t) in G(x) selects

transitions that are not in any capturing group, t = t1 ∧∆F (x, s, i,y, t) selects transi-

tions that are labeled by t1 (first capturing group), and t = t2 ∧∆F (x, s, i,y, t) select

transitions labeled by t2 (second capturing group). Function F0 denotes a carry over

action; function F1 denotes applying a new or update action to substrings captured by

the first capturing group; and function F2 denotes applying a new or update action to

substrings captured by the second capturing group. Renaming the y to x in G(y, s)

gives us the new frontier set G(x, s). The frontier derivation formulae for patterns with

multiple back references are similar, except that more tags ti(i = 1, 2, . . . ) are involved.

Example Consider the example tagged NFA in Figure 5.1 with input string “aaaa”.

Initially, the frontier set is a singleton set {(1, “”, “”)}. For the first input symbol ‘a’,

we can get that the next state can be state 1 or 2 according to the transition table in

Table 5.1. The fourth column of the transition table indicates that the transition from

state 1 to 1 is associated with a new(t1) function, which means we need to create a new

88

substring for the first capturing group using the current input symbol ’a’. The transition

from state 1 to 2 is associated with a carry over(t1) action. Since no substring has been

captured in (1, “”, “”), nothing needs to be copied from state 1 to state 2. As a result,

the new frontier set has two elements, i.e., {(1, “a”, “”), (2, “”, “”)}.

Renaming {(1, “a”, “”), (2, “”, “”)} as the current frontier set, with the second

input symbol ‘a’, we can obtain the next frontier set as {(1, “aa”, “”), (2, “a”, “”),

(3, “”, “”)}. Using the same method to process the third and fourth input symbols.

After processing the fourth input symbol ‘a’, the frontier set is {(1, “aaaa”, “”), (2,

“aaa”, “”), (3, “aa”, “”), (3, “a”, “a”), (3, “”, “aa”)}.

Acceptance Checking

The accept condition of a tagged NFA is: at the end of input string, there exist a

tuple (t, substr1, substr2, . . . ) in the frontier set such that t ∈ Fin is a final state, and

substr1 equals substr2 (for patterns with one back reference). If there are k different

back references, we need to have k pairs of captured substrings, where the two substrings

in each pair are equal. For the example tagged NFA with input string “aaaa”, it can be

observed that there is one element, i.e., (3, “a”, “a”), in the frontier set satisfying the

acceptance condition after processing the fourth input symbol ‘a’. Therefore, the input

string “aaaa” is accepted by the tagged NFA, which means the input string matches

patten (a*)aa\1.

Remarks: We note that our approach for back reference can be employed to do sub-

match extraction as well. In that case, nothing needs to be added as constraint to a

tagged NFA. One great benefit is that this approach is capable of performing pattern

matching and submatch extraction by just scanning the input string in a single pass.

5.3 Implementation

We design and built a toolchain to evaluate the performance of our back reference

approach. The implementation is based on C++. The toolchain has two components:

a compilation component and an execution component, as shown in Figure 5.2. The

89

Figure 5.2: The toolchain of our back reference implementation.

compilation component reads patterns with back references and compiles them into

tagged NFAs described in Section 5.2.1. The execution component loads compiled

tagged NFAs and matches them with a stream of input strings. In our implementation,

captured substrings are represented by their starting and ending offsets in the input

strings. In this way, substrings do not have to be copied around from states to states.

Each substring is represented using only two numbers, which saves space and overhead

of string copy operation.

We creates two instances of implementations for evaluation. In the first one, named

as NFA-backref, pattern matching is performed by operating the compiled tagged NFAs.

In the second implementation, dubbed by OBDD-backref, tagged NFAs are represented

by OBDDs and pattern matching is performed by manipulating the OBDDs data struc-

ture of the tagged NFAs.

5.4 Evaluation

5.4.1 Data Sets

We evaluate the performance of different implementations using the following two data

sets:

Patho-01

Patterns in this data set are in the form of (a?{n})a{n}\1, where the ? char is a 0

or 1 quantifier. This pattern will match a string starting with zero or one ‘a’ repeated

90

by n times, followed by n characters of ‘a’, followed by the substring captured by the

first capturing group. We evaluated the pattern for n = 5, 10, 15, 20, 25, 30. For each

pattern, we use input string in the form of an, i.e., ‘a’ repeated by n times, which will

be matched by a pattern with the same value of n.

Snort-46

The second data set includes 46 patterns containing back references from Snort 2012

HTTP signature set. We use two input traces to evaluate this pattern set. The first

trace, which we called benign trace, was generated using a string generator created by

ourselves. Given a set of patterns and a user expected match percentage p, the string

generator generates a trace file where p per cent of strings are matched by at least one

pattern in the pattern set. The size of benign trace we generated in our evaluation is

5MB. The second trace was manually crafted after carefully reviewed the 46 patterns.

We found that at least one of these patterns will suffer from the algorithmic complexity

attack if a patten matching engine is implemented by recursive backtracking. We thus

manually created a 100KB pathological trace using the approach described in Section 1.3

to evaluate how different implementations perform under an algorithmic complexity

attack.

5.4.2 Performance

We measure the time efficiency of different implementation using the number of CPU

cycles in processing each byte of input trace (cycle/byte). We evaluate the performance

of three implementations: NFA-backref, OBDD-backref, and PCRE using the data

sets described in Section 5.4.1. We did no measure the space efficiency of different

implementations since both NFA-based approach and recursive backtracking are space

efficient, as presented in Chapter 3 and Chapter 4.

Figure 5.3 shows the execution time of different implementations for the Patho-01

data set. The x-axis denotes the value of n in pattern (a?{n})a{n}\1, and the y-axis

denotes the execution time in unit of cycle/byte. It can be observed that PCRE is

the slowest implementation as n increases from 5 to 30. NFA-backref is the fastest

91

1

10

100

1000

10000

100000

1000000

10000000

10000000

5 10 15 20 25 30

Exe

c-ti

me

(cy

cle

/byt

e)

PCRE OBDD-backref NFA-backref

Figure 5.3: Performance of different implementations for the Patho-01 data set. It canbe observed that NFA-backref resists the algorithmic complexity attack and it is atleast three orders of magnitude faster than PCRE.

implementation and is faster than PCRE by at least three orders of magnitude. The

performance of OBDD-backref is between NFA-backref and PCRE. This indicates that

the OBDD data structure does not help for patterns with back references. The main

reason is that the cost of representing the new, update, and carry over actions in the

frontier derivation using OBDDs is expensive, while such operations are not needed in

the NFA-OBDD and Submatch-OBDD models.

As is shown in Figure 5.3, PCRE performs extremely slow for this pattern set. This is

mainly because that PCRE performs exhaustive recursive backtracking when matching

an input string an (i.e., ‘a’ repeated n times) against pattern (a?{n})a{n}\1. During a

recursive backtracking, the first matching path that is tried by PCRE is to match the n

characters of ‘a’ with the (a?{n}) part of the pattern. This path will fail because there

is no characters to match the remaining part a{n}\1. Then PCRE will backtrack one

step and use n− 1 characters to match the (a?{n}) part and will fail again. Continue

this way, it needs to traverse O(2n) paths before it finally succeeds by using zero ‘a’

92

1

10

100

1000

10000

100000

1000000

10000000

Exe

c-ti

me

(cy

cle

/byt

e)


(a) Performance of benign trace

1

10

100

1000

10000

100000

1000000

10000000

Exe

c-ti

me

(cy

cle

/byt

e)


(b) Performance of pathological trace

Figure 5.4: Performance of different implementations for the Snort-46 pattern set. NFA-backref outperforms PCRE by three orders of magnitude when the pathological traceis used as input.

to match the (a?{n}) part, n characters of ‘a’ to match a{n} , and zero ‘a’ to match

the back reference part \1. As the value of n gets large, the number of traversal paths

increases exponentially, which will cause PCRE to abort the backtracking process when

the size of stack is too large. In our experimentation, we observed that PCRE failed to

give correct matching results when n ≥ 25, while our implementation always returns

correct results for all input traces. The failure of PCRE for patterns when n ≥ 25 is

mainly due to that PCRE aborts the recursive backtracking when the size of recursive

stack is over a threshold.

Figure 5.5 shows the execution time of different implementations for the Snort-46

data set. Figure 5.4a shows the performance of the benign trace, and Figure 5.4b shows

the performance of the pathological trace. It can be observed that PCRE is about 10

times faster than NFA-backref when the benign trace is used as input strings. How-

ever, NFA-backref is three orders of magnitude faster than PCRE for the pathological

trace. The low performance of PCRE in Figure 5.4b is due to that the pathological

trace triggers PCRE to do exhaustively recursive backtracking during pattern match-

ing. OBDD-backref is the slowest implementation among the three in both cases. This

is mainly due to OBDD’s expensive operation cost associated with the new, update,

and carry over actions in the frontier derivation.

93

Both Figure 5.3 and Figure 5.5 shows that our implementation NFA-backref is im-

mune to the algorithmic complexity attack. Under pathological traces, our NFA-backref

implementation outperforms PCRE by orders of magnitude. Although NFA-backref is

slower than PCRE for benign traces, we argue that NFA-backref is a better implementa-

tion because network security tools, e.g., NIDS, are often exposed to attacking network

traffic, in which an attacker may deliberately craft pathological network contents to per-

form a DoS attack to a recursive backtracking-based pattern matching engine. Thus,

we believe that our approach is better suited to be deployed to process hostile network

traffic.

5.5 Related Work

Pattern matching in practice demonstrates a time/space tradeoff. DFA-based ap-

proaches are time efficient, but suffer from state blow-up. NFA-based approaches are

space efficient, but are slow in operation. Recursive backtracking-based approach is

fast in general, but can be orders of magnitude slower under an algorithmic com-

plexity attack. The time/space tradeoff has spurred a lot of recent research, primar-

ily focused on patterns that can be described by regular languages (regular expres-

sions). Many researchers aimed at reducing the memory foot prints of DFA-based ap-

proaches [80, 102, 46, 76, 77], some researchers worked on improving the time efficiency

of NFA-based approaches with hardware [39, 54, 25, 71] of software solutions [97, 99].

Patterns used in real-world security tools are often regular expressions extended

with some features. One of the important features, submatch extraction, is discussed

in Chapter 4. Another important one, back reference is discussed in this chapter. Up

to now, not much work has been done on submatch extraction and back reference.

Existing approach on submatch extraction include Google’s RE2 [29], Horne et al.’s

DFA-based algorithm [35], Laurikari’s tagged NFA approach [47], and our Submatch-

OBDD model [100] (presented in Chapter 4). However, RE2 does not support back

reference. Recursive backtracking is the de facto approach to implement back refer-

ences and has been adopted by tools such as PCRE and regular expression libraries in

94

some high level languages such as Java, Python, and Perl [28]. As we have shown, a re-

cursive backtracking based implementation suffers from algorithmic complexity attacks.

Becchi and Crowley proposed to model a back reference problem with an automaton-like

machine [15]. Their approach constructs a special state for each back reference instance.

Substrings are recorded in a back reference state and are matched in a consuming way.

Becchi’s approach works in the situation when there is only one back reference instance

for a capturing group but fails when there are multiple back reference instances for a

same capturing group. Also, it is not clear how Becchi’s approach performs because

no execution time was reported in their paper. Namjoshi and Narlikar presented an

automaton-based back reference approach [56] similar to [15]. Our approach differen-

tiate from [15] and [56] in that we do not construct special states or input symbols for

back references. Instead, we treat all the states in an NFA-like machine in the same

manner, and add constraints to the acceptance condition of the constructed tagged

NFA. We also showed that our approach is immune to known algorithmic complexity

attacks.

5.6 Summary

In this chapter, we present a new matching algorithm for patterns with back refer-

ences. Our approach works by converting a back reference problem to a conditional

submatch extraction problem. We then construct NFA-like machines to represent pat-

tern matching with back references. We build a toolchain and evaluate the performance

of our approach using both synthetic data set and data set from real-world NIDS. Our

experimental results have shown that our implementation NFA-backref is immune to

known algorithmic complexity attack. In particular, NFA-backref is at least three or-

ders of magnitude faster than PCRE, a recursive backtracking-based pattern matching

engine. Under benign traffic, NFA-backref is one order of magnitude slower than PCRE.

We believe that our approach is better suited for network security tools because such

tools are often exposed to hostile network traffic that has potential to abuse a recur-

sive backtracking based pattern matching engine. We believe that the performance of

NFA-backref will be further improved with better code optimization.

95

Chapter 6

Conclusion and Future Directions

6.1 Conclusions

Pattern matching algorithms in network security applications demonstrate a time/space

tradeoff. In this dissertation, we present several new pattern matching techniques for

network security applications. We have shown that it is possible to design a pattern

matching engine that is orders of magnitude faster than NFA-based pattern matching

algorithms, immue to known algorithmic complexity attacks, and retaining the space

efficiency of NFAs. To this end, we have developed three techniques: NFA-OBDD,

Submatch-OBDD, and NFA-backref.

6.1.1 NFA-OBDD

Our first contribution is NFA-OBDD, which is designed to improve the time efficiency

of NFA-based regular expression (regular language) matching. Our design employs

symbolic Boolean functions to describe the NFA representation of regular expressions.

We represent and manipulate Boolean functions using OBDDs, which can effectively

remove the redundany of transition relations and set of states. The use of OBDDs

allows us to apply an NFA transition relation to all states in a frontier set in a single

operation in order to produce the new frontier set.

We evaluate the performance of NFA-OBDD using real-world patterns and net-

work traces. Our experimental results have shown that NFA-OBDD outperforms a

traditional NFA implementation by three orders of magnitude, while still retaining the

efficiency of NFAs. NFA-OBDD is competitive with MDFA (a DFA variant) in term

of time efficiency, but consuming much less memory than MDFA. It outperforms or is

96

competitive with PCRE.

6.1.2 Submatch-OBDD

Our second contribution is to extend NFA-OBDD to model submatch extraction, an

important feature in real-world patterns used by network security applications. We

evaluate our submatch extraction approach (Submatch-OBDD) using patterns in an

NIDS and a commerical SIEM system. Our experiments using real network traces,

synthetic traces, and security event logs have shown that Submatch-OBDD outperforms

RE2 and PCRE by one to two orders of magnitude, while retaining the space efficiency

of NFAs.

6.1.3 NFA-backref

The third contribution of this dissertation is a new algorithm for patterns with back

references, which are non-regular languages. We propose to convert a back reference

problem to a conditional submatch extraction problem. We then construct tagged NFAs

to represent patterns with submatch constraints. We evaluate our approach using two

instances of implementations: NFA-backref and OBDD-backref. Our experimental re-

sults have shown that NFA-backref is immune to known algorithmic complexity attacks

and outperforms PCRE by at least three orders of magnitude under pathological traces.

For benign traces, PCRE is an order of magnitude faster than NFA-backref. Never-

theless, we believe that NFA-backref is a better than a recursive backtracking-based

implementation. This is because a NIDS is often exposed to hostile network traffic that

may contain pathological network contents to abuse a recursive backtracking-based

pattern matching engine.

6.2 Future Directions

There are several directions that can be explored in the future.

97

6.2.1 Hardware-based NFA-OBDD and Submatch-OBDD

NIDS vendors are increasingly beginning to deploy hardware-based deep packet in-

spection products. While this dissertation explored the potential of NFA-OBDD

and Submatch-OBDD using a software-based implementation, a hardware-based so-

lution would be required to provide raw matching speeds approaching multiple giga-

bit/second. Although OBDDs [104, 74] and NFAs [71, 25, 32] have each been individu-

ally implemented in hardware (such as FPGAs and CAMs), further research is needed to

investigate the possibility of a hardware-based NFA-OBDD and Submatch-OBDD im-

plementations. The key challenge in implementing NFA-OBDD and Submatch-OBDD

in hardware is to devise techniques that would allow OBDDs to be modified within

hardware, e.g., to allow OBDD(F) to be modified efficiently as each input symbol is

processed. It is also possible to implement our back reference approach NFA-backref

in hardware. The key challenge in the implementation is how to represent the actions

used for maintaining the captured substrings during frontier derivation.

6.2.2 Security in Software Defined Networking

Software Defined Networking (SDN) [95] has emerged as an important technique for the

next generation of data centers and cloud computing. SDN allows the identity and flow

control logic of communication entities to be decoupled from the basic topology based

forwarding, bridging, and routing. The emergence of SDN will add a lot of opportunities

and challenges to the network and software communities. There are several promising

directions in the security of SDN.

Security Applications in SDN

Traditional network security applications, e.g., NIDS and firewalls, work by inspecting

network traffic according to the communication protocols, network interfaces, and com-

munication ports. In SDN, communication entities are no longer bound with network

interfaces as in a traditional network infrastructure. Existing solutions of NIDS and

firewalls do not fit SDN. Some open questions that can be explored are: “How network

98

security applications, e.g., NIDS and firewalls, will be affected under the SDN struc-

ture?”, and “How to design efficient NIDS that are capable of scanning network entities

according to their logical grouping and resistant with VM migration?”.

Digital Forensics in SDN

The emergence of SDN will pose new challenges to digital forensics. For example,

an attacker can set up a malicious VM in a virtualized network on the cloud. Since

SDN techniques decouple the identities of communication entities from the physical

network interfaces they are attached to, it might be more difficult to collect evidence of

malicious behavior and trace back the attackers. One problem that can be studied is

to investigate techniques allowing the linkage of physical resources of attackers to their

logical identities.

Configuration Validation in SDN

Configuration is the “glue” for logically integrating and setting up network components

satisfying end-to-end requirements in terms of security, connectivity, performance, and

reliability. A real infrastructure can have hundreds of components, where each compo-

nent can have a couple thousand of configuration commands. It is known that manual

configuration is error prone. Researchers have developed tools and techniques to val-

idate whether a given configuration satisfies specified requirements. The emerging of

SDN will add new challenges to network configuration since the control plane is sep-

arated from the data plane and is implemented as software. In the future, several

problems can be explored: “How to construct a configuration acquisition system to

extract configuration information from components in SDN?”, “How to create a re-

quirement library that captures the best practices and design patterns of end-to-end

requirements in SDN?”, and “How to develop an evaluation tool that is capable of ef-

ficiently evaluating requirements and suggesting alternative configurations for false or

changed requirements?”.

99

References

[1] Tarari regex content processor. http://www.tarari.com, 2010.

[2] Tipping point. http://www.tippingpoint.com, 2010.

[3] Gawk. http://www.gnu.org/software/gawk/manual/gawk.html, Last re-trieved in March 2013.

[4] Grep. http://www.gnu.org/software/grep/, Last retrieved in March 2013.

[5] Snort. http://www.snort.org/, 2013.

[6] A. V. Aho. Algorithms for finding patterns in strings. In Handbook of TheoreticalComputer Science, Volume A: Algorithms and Complexity (A), pages 255–300.1990.

[7] A. V. Aho and M. J. Corasick. Efficient string matching: An aid to bibliographicsearch. Comm. ACM, 18(6):333–340, 1975.

[8] A. V. Aho and S. C. Johnson. Optimal code generation for expression trees. InACM Symp. on Theory of Computing, New York, NY, USA, 1975. ACM.

[9] B. S. Baker. Parameterized pattern matching by Boyer-Moore type algorithms.In Proc. Sixth Annual ACM-SIAM Symp. on Discrete Algorithms, pages 541–550,Jan 1995.

[10] B. S. Baker. Parameterized pattern matching: Algorithms and applications. J.Comput. Syst. Sci., 52(1):28–42, Feb 1996.

[11] M. Becchi and S. Cadambi. Memory-efficient regular expression search using statemerging. In Proceedings of IEEE Infocom, 2007.

[12] M. Becchi and P. Crowley. A hybrid finite automaton for practical deep packetinspection. In Intl. Conf. on emerging Networking EXperiments and Technologies,2007.

[13] M. Becchi and P. Crowley. An improved algorithm to accelerate regular expressionevaluation. In Proceedings of the 3rd ACM/IEEE Symposium on Architecture fornetworking and communications systems, ANCS ’07, pages 145–154, New York,NY, USA, 2007. ACM.

[14] M. Becchi and P. Crowley. Efficient regular expression evaluation: Theory topractice. In Intl. Conf. on Architectures for Networking and CommunicationSystems, pages 50–59. ACM, 2008.

100

[15] M. Becchi and P. Crowley. Extending finite automata to efficiently match perl-compatible regular expressions. In Proceedings of the 2008 ACM CoNEXT Con-ference, CoNEXT ’08, pages 25:1–25:12, New York, NY, USA, 2008. ACM.

[16] R. S. Boyer and J. S. Moore. A fast string searching algorithm. Communicationsof the ACM, 20(10):62–72, 1977.

[17] Bro. The bro network security monitor. http://bro.org/, Last retrieved inMarch 2013.

[18] B. C. Brodie, D. E. Taylor, and R. K. Cytron. A scalable architecture for high-throughput regular-expression pattern matching. In Intl. Symp. Computer Ar-chitecture, pages 191–202. IEEE Computer Society, 2006.

[19] D. Brumley, J. Newsome, D. Song, H. Wang, and S. Jha. Towards automaticgeneration of vulnerability-based signatures. In IEEE Symposium on Securityand Privacy, May 2006.

[20] R. E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEETransactions on Computers, 35(8):677–691, 1986.

[21] J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and J. Hwang. Symbolicmodel checking: 1020 states and beyond. In Symposium on Logic in ComputerScience, pages 401–424. IEEE Computer Society, 1990.

[22] D. Chasaki and T. Wolf. Fast regular expression matching in hardware using nfa-bdd combination. In Proceedings of the 6th ACM/IEEE Symposium on Architec-tures for Networking and Communications Systems, ANCS ’10, pages 12:1–12:2,New York, NY, USA, 2010. ACM.

[23] M. Christiansen and E. Fleury. An MTIDD based firewall. TelecommunicationSystems, 27(2–4), October 2004.

[24] Cisco. IOS terminal services configuration guide. http://tinyurl.com/2eouvq.

[25] C. R. Clark and D. E. Schimmel. Scalable pattern matching for high-speed net-works. In IEEE Symp. on Field-Programmable Custom Computing Machines,pages 249–257. IEEE Computer Society, 2004.

[26] B. Commentz-Walter. A string matching algorithm fast on the average. In Proc.Intl. Cooloquium on Automata, Languages, and Programming, pages 118–132,1979.

[27] O. Coudert, C. Berthet, and J. C. Madre. Verification of synchronous sequentialmachines based on symbolic execution. In Proceedings of the international work-shop on Automatic verification methods for finite state systems, pages 365–373,New York, NY, USA, 1990. Springer-Verlag New York, Inc.

[28] R. Cox. Regular expression matching can be simple and fast (but is slow inJava, Perl, PHP, Python, Ruby, ...), 2007. http://swtch.com/~rsc/regexp/

regexp1.html.

101

[29] R. Cox. Implementing regular expressions. http://swtch.com/~rsc/regexp/,Last retrieved in August 2011.

[30] S. Dharmapurikar, P. Krishnamurthy, T. S. Sproull, and J. W. Lockwood. Deeppacket inspection using parallel bloom filters. IEEE Micro, 24(1):52–61, 2004.

[31] S. Dharmapurikar and J. W. Lockwood. Fast and scalable pattern matchingfor network intrusion detection systems. Jour. on Selected Areas in Comm.,24(10):1781–1792, 2006.

[32] R. W. Floyd and J. D. Ullman. The compilation of regular expressions intointegrated circuits. Journal of the ACM, 29(3), July 1982.

[33] M. G. Gouda and X. Liu. Firewall design: Consistency, completeness and com-pactness. In Intl. Conf. on Distributed Computing Systems, Mar 2004.

[34] J. D. Guttman and A. L. Herzog. Rigorous automated network security manage-ment. Intl. Jour. of Information Security, 4(1–2), 2004.

[35] S. Haber, W. Horne, P. Manadhata, M. Mowbray, and P. Rao. Efficient submatchextraction for practical regular expression. In The 7th International Conferenceon Language and Automata Theory and Applications, Bilbao, Spain, April 2013.

[36] M. Handley, V. Paxson, and C. Kreibich. Network intrusion detection: Evasion,traffic normalization, and end-to-end protocol semantics. In Usenix Security,pages 9–9. USENIX, 2001.

[37] S. Hazelhurst, A. Fatti, and A. Henwood. Binary decision diagram representationsof firewall and router access lists. Technical report, University of Witwatersrand,Johannesburg, South Africa, 1998.

[38] J. E. Hopcroft, R. Motwani, and J. D. Ullman. Introduction to Automata Theory,Languages, and Computation, Third Edition. Addison-Wesley, 2007.

[39] B. L. Hutchings, R. Franklin, and D. Carver. Assisting network intrusion de-tection with reconfigurable hardware. In Annual Symp. on Field-ProgrammableCustom Computing Machines, pages 111–120. IEEE Computer Society, 2002.

[40] S. Johnson. Yacc – yet another compiler compiler. Computing Science Tech. Rep.32, AT&T Bell Labs, 1975.

[41] M. Jordan. Dealing with metamorphism. Virus Bulletin Weekly, 2002.

[42] H. Kim and B. Karp. Autograph: Toward automated, distributed worm signaturedetection. In USENIX Security Symposium, pages 271–286, 2004.

[43] S. Kim and Y. Kim. A fast multiple string-pattern matching algorithm. InAoM/IAoM Conf. on Computer Science, August 1999.

[44] D. E. Knuth, J. Morris, and V. R. Pratt. Fast pattern matching in strings. SIAMJournal of Computing, 6(2):323–350, 1977.

102

[45] S. Kong, R. Smith, and C. Estan. Efficient signature matching with multiplealphabet compression tables. In Intl. Conf. on Security and Privacy in Commu-nication Networks, 2008.

[46] S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, and J. Turner. Algorithms toaccelerate multiple regular expressions matching for deep packet inspection. InACM SIGCOMM Conference, pages 339–350. ACM, 2006.

[47] V. Laurikari. NFAs with tagged transitions, their conversion to deterministicautomata and application to regular expressions. In SPIRE’00, September 2000.

[48] R. Lippmann, J. W. Haines, D. J. Fried, J. Korba, and K. Das. The 1999 DARPAoff-line intrusion detection evaluation. Computer Networks, 34(4):579–595, Octo-ber 2000.

[49] R. Liu, N. Huang, C. Chen, and C. Kao. A fast string-matching algorithm for net-work processor-based intrusion detection system. Trans. on Embedded ComputingSys., 3(3):614–633, 2004.

[50] T. Liu, Y. Sun, A. X. Liu, L. Guo, and B. Fang. A prefiltering approach to regularexpression matching for network security systems. In Proceedings of the 10th in-ternational conference on Applied Cryptography and Network Security, ACNS’12,pages 363–380, Berlin, Heidelberg, 2012. Springer-Verlag.

[51] J. McHugh. Testing intrusion detection systems: A critique of the 1998 and1999 DARPA intrusion detection system evaluations as performed by Lincolnlaboratories. ACM Transactions on Information and System Security, 3(4):262–294, November 2000.

[52] C. Meiners, J. Patel, E. Norige, E. Torng, and A. X. Liu. Fast regular expressionmatching using small TCAMs for network intrusion detection and preventionsystems. In 19th USENIX Security Symposium, August 2010.

[53] C. R. Meiners, E. Norige, A. X. Liu, and E. Torng. Flowsifter: A countingautomata approach to layer 7 field extraction for deep flow inspection. In A. G.Greenberg and K. Sohraby, editors, INFOCOM, pages 1746–1754. IEEE, 2012.

[54] A. Mitra, W. Najjar, and L. Bhuyan. Compiling PCRE to FPGA for acceleratingSnort IDS. In Symp. on Arch. for Networking and Comm. Systems, pages 127–136. ACM, 2007.

[55] R. Muth and U. Manber. Approximate multiple string search. In D. S. Hirschbergand E. W. Myers, editors, Annual Symp. on Combinatorial Pattern Matching,number 1075, pages 75–86, Laguna Beach, CA, 1996. Springer-Verlag, Berlin.

[56] K. Namjoshi and G. Narlikar. Robust and fast pattern matching for intrusiondetection. In Proceedings of the 29th conference on Information communications,INFOCOM’10, pages 740–748, Piscataway, NJ, USA, 2010. IEEE Press.

[57] G. Navarro and M. Raffinot. Fast and flexible string matching by combiningbit-parallelism and suffix automata. J. Exp. Algorithmics, 5:4, 2000.

103

[58] J. Newsome, B. Karp, and D. Song. Polygraph: Automatically generating sig-natures for polymorphic worms. In IEEE Symposium on Security and Privacy,pages 226–241, Washington, DC, USA, 2005. IEEE Computer Society.

[59] J. Ousterhout. Tcl programming language. http://www.tcl.tk/, Last retrievedin March 2013.

[60] J. Patel, A. Liu, and E. Torng. Bypassing space explosion in regular expressionmatching for network intrusion detection and prevention systems. In Proceedingsof the 19th Annual Network and Distributed System Security Symposium, SanDiego, California, Februray 2012.

[61] V. Paxson. Bro: a system for detecting network intruders in real-time. Comput.Netw., 31(23-24):2435–2463, Dec. 1999.

[62] PCRE. The Perl compatible regular expression library. http://www.pcre.org.

[63] R. Perdisci, D. Ariu, P. Fogla, G. Giacinto, and W. Lee. Mcpad: A multipleclassifier system for accurate payload-based anomaly detection. Comput. Netw.,53(6):864–881, Apr. 2009.

[64] R. Pike. The text editor sam. Softw. Pract. Exper., 17:813–845, November 1987.

[65] T. Ptacek and T. Newsham. Insertion, evasion and denial of service: Eludingnetwork intrusion detection. http://insecure.org/stf/secnet_ids/secnet_

ids.html.

[66] M. Roesch. Snort - lightweight intrusion detection for networks. In USENIXConf. on System Administration, pages 229–238. USENIX, 1999.

[67] S. Rubin, S. Jha, and B. Miller. Language-based generation and evaluation ofNIDS signatures. In Symposium on Security and Privacy, Oakland, California,May 2005.

[68] L. Salmela, J. Tarhio, and J. Kytojoki. Multipattern string matching with q-grams. Jour. of Experimental Algorithmics, 11:1.1, 2006.

[69] N. Schear, D. R. Albrecht, and N. Borisov. High-speed matching of vulnerabilitysignatures. In Lippman, R., Kirda, E., Trachtenberg, A., (eds.) RAID 2008,volume 5230 of Lecture Notes in Computer Science, pages 155–174. Springer,2008.

[70] U. Shankar and V. Paxson. Active mapping: Resisting NIDS evasion withoutaltering traffic. In Symp. on Security and Privacy, pages 44–61. IEEE ComputerSociety, 2003.

[71] R. Sidhu and V. Prasanna. Fast regular expression matching using FPGAs.In Symp. on Field-Programmable Custom Computing Machines, pages 227–238.IEEE Computer Society, 2001.

[72] R. P. S. Sidhu, A. Mei, and V. K. Prasanna. String matching on multicontextfpgas using self-reconfiguration. In Proceedings of the 1999 ACM/SIGDA seventhinternational symposium on Field programmable gate arrays, FPGA ’99, pages217–226, New York, NY, USA, 1999. ACM.

104

[73] S. Singh, C. Estan, G. Varghese, and S. Savage. Automated worm fingerprinting.In USENIX/ACM Symposium on Operating System Design and Implementation,pages 45–60, 2004.

[74] R. Sinnappan and S. Hazelhurst. A reconfigurable approach to packet filtering.In Brebner, G., and Woods, R., (eds.) FPL 2001, volume 2147 of Lecture Notesin Computer Science, pages 638–642. Springer, 2001.

[75] R. Smith, C. Estan, and S. Jha. Backtracking algorithmic complexity attacksagainst a NIDS. In Annual Computer Security Applications Conf., pages 89–98.IEEE Computer Society, 2006.

[76] R. Smith, C. Estan, and S. Jha. XFA: Faster signature matching with extendedautomata. In Symp. on Security and Privacy, pages 187–201. IEEE ComputerSociety, 2008.

[77] R. Smith, C. Estan, S. Jha, and S. Kong. Deflating the Big Bang: Fast andscalable deep packet inspection with extended finite automata. In SIGCOMMConference, pages 207–218. ACM, 2008.

[78] Snort. Download snort rules. http://www.snort.org/snort-rules/, Last re-trieved in March 2013.

[79] F. Somenzi. CUDD: CU decision diagram package, release 2.4.2. Department ofElectrical, Computer, and Energy Engineering, University of Colorado at Boulder.http://vlsi.colorado.edu/~fabio/CUDD.

[80] R. Sommer and V. Paxson. Enhancing byte-level network intrusion detectionsignatures with context. In Conf. on Computer and Comm. Security, pages 262–271. ACM, 2003.

[81] R. Sommer and V. Paxson. Outside the closed world: On using machine learn-ing for network intrusion detection. In Symp. on Security and Privacy. IEEEComputer Society, 2010.

[82] I. Sourdis and D. Pnevmatikatos. Fast, large-scale string match for a 10GbpsFPGA-based network intrusion detection system. In Cheung, P., Constantinides,G., Sousa, J., (eds.) FPL 2003, volume 2778 of Lecture Notes in Computer Sci-ence, pages 880–889. Springer, 2003.

[83] L. Tan and T. Sherwood. A high throughput string matching architecture forintrusion detection and prevention. In Intl. Symp. Computer Architecture, pages112–122. IEEE Computer Society, 2005.

[84] K. Thompson. Programming techniques: Regular expression search algorithm.Commun. ACM, 11:419–422, June 1968.

[85] H. J. Touati, H. Savoj, B. Lin, R. K. Brayton, and A. Sangiovanni-Vincentelli.Implicit state enumeration of finite state machines using bdd’s. In IEEE Interna-tional Conference on Computer-Aided Design, pages 130–133, Santa Clara, CA,1990. IEEE.

105

[86] N. Tuck, T. Sherwood, B. Calder, and G. Varghese. Deterministic memory-efficient string matching algorithms for intrusion detection. In IEEE INFOCOM,pages 333–340. IEEE Computer Society, 2004.

[87] G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and S. Ioannidis.Gnort: High performance network intrusion detection using graphics processors.In Lippman, R., Kirda, E., Trachtenberg, A., (eds.) RAID 2008, volume 5230 ofLecture Notes in Computer Science, pages 116–134. Springer, 2008.

[88] H. J. Wang, C. Guo, D. R. Simon, and A. Zugenmaier. Shield: Vulnerability-driven network filters for preventing known vulnerability exploits. In ACM SIG-COMM, August 2004.

[89] K. Wang, G. Cretu, and S. J. Stolfo. Anomalous payload-based worm detectionand signature generation. In Proceedings of the 8th international conference onRecent Advances in Intrusion Detection, RAID’05, pages 227–246, Berlin, Heidel-berg, 2006. Springer-Verlag.

[90] B. W. Watson. The performance of single and multiple keyword pattern matchingalgorithms. In Third South American Workshop on String Processing, Recife,Brazil, August 1996.

[91] Wikipedia. Anomaly-based intrusion detection system. http://en.wikipedia.

org/wiki/Anomaly-based_intrusion_detection_system, Last retrieved inMarch 2013.

[92] Wikipedia. Host-based intrusion detection system. http://en.wikipedia.org/

wiki/Host-based_intrusion_detection_system, 2013.

[93] Wikipedia. Intrusion detection system. http://en.wikipedia.org/wiki/

Intrusion_detection_system, 2013.

[94] Wikipedia. Security information and event management. http://en.

wikipedia.org/wiki/Security_information_and_event_management, 2013.

[95] Wikipedia. Software-defined networking. http://en.wikipedia.org/wiki/

Software-defined_networking, 2013.

[96] S. Wu and U. Manber. A fast algorithm for multi-pattern searching. TR 94-17,Department of Computer Science, University of Arizona, 1994.

[97] L. Yang, R. Karim, V. Ganapathy, and R. Smith. Improving nfa-based signaturematching using ordered binary decision diagrams. In RAID’10, volume 6307of Lecture Notes in Computer Science (LNCS), pages 58–78, Ottawa, Canada,September 2010. Springer.

[98] L. Yang, R. Karim, V. Ganapathy, and R. Smith. Signatures referenced inSection 3.4 and Section 3.5, 2010. Available at http://www.cs.rutgers.edu/

~vinodg/papers/raid2010.

[99] L. Yang, R. Karim, V. Ganapathy, and R. Smith. Fast, memory-efficient regu-lar expression matching with nfa-obdds. Computer Networks, 55(15):3376–3393,October 2011.

106

[100] L. Yang, P. Manadhata, W. Horne, P. Rao, and V. Ganapathy. Fast submatchextraction using obdds. In Proceedings of the eighth ACM/IEEE symposium onArchitectures for networking and communications systems, ANCS ’12, pages 163–174, New York, NY, USA, 2012. ACM.

[101] V. Yegneswaran, J. T. Giffin, P. Barford, and S. Jha. An architecture forgenerating semantics-aware signatures. In USENIX Security Symposium, Bal-timore,Maryland, Aug. 2005.

[102] F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz. Fast and memory-efficient regular expression matching for deep packet inspection. In ACM/IEEESymp. on Arch. for Networking and Comm. Systems, pages 93–102, 2006.

[103] L. Yuan, J. Mai, Z. Su, H. Chen, C. Chuah, and P. Mohapatra. FIREMAN: Atoolkit for firewall modeling and analysis. In Symp. on Security and Privacy, May2006.

[104] S. Yusuf and W. Luk. Bitwise optimized CAM for network intrusion detectionsystems. In Intl. Conf. on Field Prog. Logic and Applications, pages 444–449.IEEE Press, 2005.

c 2013 liu yang all rights reserved - rutgers universityvinodg/students/liuyang_phdthesis.pdf ·...

Documents