slowfuzz: automated domain-independent detection of ......slowfuzz: automated domain-independent...

14
SF: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theolos Petsios [email protected] Columbia University Jason Zhao [email protected] Columbia University Angelos D. Keromytis [email protected] Columbia University Suman Jana [email protected] Columbia University Abstract Algorithmic complexity vulnerabilities occur when the worst- case time/space complexity of an application is signicantly higher than the respective average case for particular user- controlled inputs. When such conditions are met, an attacker can launch Denial-of-Service attacks against a vulnerable ap- plication by providing inputs that trigger the worst-case be- havior. Such attacks have been known to have serious eects on production systems, take down entire websites, or lead to bypasses of Web Application Firewalls. Unfortunately, existing detection mechanisms for algorith- mic complexity vulnerabilities are domain-specic and often require signicant manual eort. In this paper, we design, implement, and evaluate SF, a domain-independent framework for automatically nding algorithmic complex- ity vulnerabilities. SF automatically nds inputs that trigger worst-case algorithmic behavior in the tested binary. SF uses resource-usage-guided evolutionary search techniques to automatically nd inputs that maximize compu- tational resource utilization for a given application. We demonstrate that SF successfully generates in- puts that match the theoretical worst-case performance for several well-known algorithms. SF was also able to generate a large number of inputs that trigger dierent algo- rithmic complexity vulnerabilities in real-world applications, including various zip parsers used in antivirus software, regu- lar expression libraries used in Web Application Firewalls, as well as hash table implementations used in Web applications. In particular, SF generated inputs that achieve 300- times slowdown in the decompression routine of the bzip2 utility, discovered regular expressions that exhibit matching times exponential in the input size, and also managed to auto- matically produce inputs that trigger a high number of colli- sions in PHP’s default hashtable implementation. 1 INTRODUCTION Algorithmic complexity vulnerabilities result from large dif- ferences between the worst-case and average-case time/space Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or aliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. CCS ’17, Dallas, TX, USA © 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. 978-1-4503-4946-8/17/10. . . $15.00 DOI: 10.1145/3133956.3134073 complexities of algorithms or data structures used by aected software [31]. An attacker can exploit such vulnerabilities by providing specially crafted inputs that trigger the worst-case behavior in the victim software to launch Denial-of-Service (DoS) attacks. For example, regular expression matching is known to exhibit widely varying levels of time complexity (from linear to exponential) on input string size depending on the type of the regular expression and underlying implemen- tation details. Similarly, the run times of hash table insertion and lookup operations can dier signicantly if the hashtable implementation suers from a large number of hash colli- sions. Sorting algorithms like quicksort can have an O ( nlon) average-case complexity but an O ( n 2 ) worst-case complexity. Such worst-case behaviors have been known to take down entire websites [22], disable/bypass Web Application Firewalls (WAF) [6], or to keep thousands of CPUs busy by merely per- forming hash-table insertions [19, 24]. Despite their potential severity, in practice, detecting algo- rithmic complexity vulnerabilities in a domain-independent way is a hard, multi-faceted problem. It is often infeasible to completely abandon algorithms or data structures with high worst-case complexities without severely restricting the func- tionality or backwards-compatibility of an application. Manual time complexity analysis of real-world applications is hard to scale. Moreover, asymptotic complexity analysis ignores the constant factors that can signicantly aect the application execution time despite not impacting the overall complexity class. All these factors signicantly harden the detection of algorithmic complexity vulnerabilities. Even when real-world applications use well-understood algorithms, time complexity analysis is still non-trivial for the following reasons. First, the time/space complexity analysis changes signicantly even with minor implementation vari- ations (for instance, the choice of the pivot in the quicksort algorithm drastically aects its worst-case runtime behav- ior [30]). Reasoning about the eects of such changes requires signicant manual eort. Second, most real-world applications often have multiple inter-connected components that interact in complex ways. This interconnection further complicates the estimation of the overall complexity, even when the time complexity of the individual components is well understood. Most existing detection mechanisms for algorithmic com- plexity vulnerabilities use domain- and implementation-specic heuristics or rules, e.g., detect excessive backtracking during regular expression matching [5, 25]. However, such rules tend arXiv:1708.08437v1 [cs.CR] 28 Aug 2017

Upload: others

Post on 26-Jul-2020

25 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

SlowFuzz: Automated Domain-Independent Detection ofAlgorithmic Complexity VulnerabilitiesTheolos Petsios

[email protected] University

Jason [email protected]

Columbia University

Angelos D. [email protected]

Columbia University

Suman [email protected]

Columbia University

Abstract

Algorithmic complexity vulnerabilities occur when the worst-case time/space complexity of an application is signicantlyhigher than the respective average case for particular user-controlled inputs. When such conditions are met, an attackercan launch Denial-of-Service attacks against a vulnerable ap-plication by providing inputs that trigger the worst-case be-havior. Such attacks have been known to have serious eectson production systems, take down entire websites, or lead tobypasses of Web Application Firewalls.

Unfortunately, existing detection mechanisms for algorith-mic complexity vulnerabilities are domain-specic and oftenrequire signicant manual eort. In this paper, we design,implement, and evaluate SlowFuzz, a domain-independentframework for automatically nding algorithmic complex-ity vulnerabilities. SlowFuzz automatically nds inputs thattrigger worst-case algorithmic behavior in the tested binary.SlowFuzz uses resource-usage-guided evolutionary searchtechniques to automatically nd inputs that maximize compu-tational resource utilization for a given application.

We demonstrate that SlowFuzz successfully generates in-puts that match the theoretical worst-case performance forseveral well-known algorithms. SlowFuzz was also able togenerate a large number of inputs that trigger dierent algo-rithmic complexity vulnerabilities in real-world applications,including various zip parsers used in antivirus software, regu-lar expression libraries used in Web Application Firewalls, aswell as hash table implementations used in Web applications.In particular, SlowFuzz generated inputs that achieve 300-times slowdown in the decompression routine of the bzip2utility, discovered regular expressions that exhibit matchingtimes exponential in the input size, and also managed to auto-matically produce inputs that trigger a high number of colli-sions in PHP’s default hashtable implementation.

1 INTRODUCTION

Algorithmic complexity vulnerabilities result from large dif-ferences between the worst-case and average-case time/space

Publication rights licensed to ACM. ACM acknowledges that this contributionwas authored or co-authored by an employee, contractor or aliate of theUnited States government. As such, the Government retains a nonexclusive,royalty-free right to publish or reproduce this article, or to allow others to doso, for Government purposes only.CCS ’17, Dallas, TX, USA

© 2017 Copyright held by the owner/author(s). Publication rights licensed toACM. 978-1-4503-4946-8/17/10. . . $15.00DOI: 10.1145/3133956.3134073

complexities of algorithms or data structures used by aectedsoftware [31]. An attacker can exploit such vulnerabilities byproviding specially crafted inputs that trigger the worst-casebehavior in the victim software to launch Denial-of-Service(DoS) attacks. For example, regular expression matching isknown to exhibit widely varying levels of time complexity(from linear to exponential) on input string size depending onthe type of the regular expression and underlying implemen-tation details. Similarly, the run times of hash table insertionand lookup operations can dier signicantly if the hashtableimplementation suers from a large number of hash colli-sions. Sorting algorithms like quicksort can have an O(nloдn)average-case complexity but an O(n2) worst-case complexity.Such worst-case behaviors have been known to take downentire websites [22], disable/bypass Web Application Firewalls(WAF) [6], or to keep thousands of CPUs busy by merely per-forming hash-table insertions [19, 24].

Despite their potential severity, in practice, detecting algo-rithmic complexity vulnerabilities in a domain-independentway is a hard, multi-faceted problem. It is often infeasible tocompletely abandon algorithms or data structures with highworst-case complexities without severely restricting the func-tionality or backwards-compatibility of an application. Manualtime complexity analysis of real-world applications is hard toscale. Moreover, asymptotic complexity analysis ignores theconstant factors that can signicantly aect the applicationexecution time despite not impacting the overall complexityclass. All these factors signicantly harden the detection ofalgorithmic complexity vulnerabilities.

Even when real-world applications use well-understoodalgorithms, time complexity analysis is still non-trivial for thefollowing reasons. First, the time/space complexity analysischanges signicantly even with minor implementation vari-ations (for instance, the choice of the pivot in the quicksortalgorithm drastically aects its worst-case runtime behav-ior [30]). Reasoning about the eects of such changes requiressignicant manual eort. Second, most real-world applicationsoften have multiple inter-connected components that interactin complex ways. This interconnection further complicatesthe estimation of the overall complexity, even when the timecomplexity of the individual components is well understood.

Most existing detection mechanisms for algorithmic com-plexity vulnerabilities use domain- and implementation-specicheuristics or rules, e.g., detect excessive backtracking duringregular expression matching [5, 25]. However, such rules tend

arX

iv:1

708.

0843

7v1

[cs

.CR

] 2

8 A

ug 2

017

wcventure
Insert text
算法复杂性漏洞
wcventure
Insert text
wcventure
Insert text
当应用程序的最坏情况时间/空间复杂性显着高于特定用户控制输入的相应平均情况时,会出现算法复杂性漏洞。
wcventure
Insert text
众所周知,此类攻击会严重影响生产系统,关闭整个网站,或导致绕过Web应用程序防火墙。
wcventure
Insert text
遗憾的是,现有的算法复杂性漏洞检测机制是特定于域的,并且通常需要大量的手动操作。
wcventure
Insert text
在本文中,我们设计,实现和评估SlowFuzz,这是一个独立于域的框架,用于自动查找算法复杂性漏洞。
wcventure
Insert text
SlowFuzz自动查找在测试二进制文件中触发最坏情况算法行为的输入。
wcventure
Insert text
SlowFuzz使用资源使用指导的进化搜索技术自动查找输入,以最大化给定应用程序的计算资源利用率。
wcventure
Insert text
我们演示了SlowFuzz成功生成的输入与几种众所周知的算法的理论最坏情况性能相匹配。 SlowFuzz还能够生成大量输入,触发实际应用程序中不同的算法复杂性漏洞,包括防病毒软件中使用的各种zip解析器,Web应用程序防火墙中使用的正则表达式库,以及Web中使用的哈希表实现应用。
wcventure
Insert text
特别是,SlowFuzz生成的输入在bzip2实用程序的解压缩例程中实现了300倍的减速,发现了在输入大小中呈现出指数匹配时间的正则表达式,并且还设法自动生成在PHP中触发大量冲突的输入 默认哈希表实现。
wcventure
Insert text
攻击者可以通过提供特制输入来利用此类漏洞,这些输入会触发受害软件中的最坏情况行为,从而发起拒绝服务(DoS)攻击
wcventure
Insert text
例如,已知正则表达式匹配在输入字符串大小上表现出广泛变化的时间复杂度(从线性到指数),这取决于正则表达式的类型和底层实现细节。
wcventure
Insert text
攻击者可以通过提供特制输入来利用此类漏洞,这些输入会触发受害软件中的最坏情况行为,从而发起拒绝服务(DoS)攻击
wcventure
Insert text
当满足此类条件时,攻击者可以通过提供触发最坏情况行为的输入来针对易受攻击的应用程序启动拒绝服务攻击。
wcventure
Insert text
类似地,如果散列表实现遭受大量散列冲突,则散列表插入和查找操作的运行时间可能显着不同。 像quicksort这样的排序算法可以具有O(nlogn)平均情况复杂度,但是具有O(n ^ 2)最坏情况复杂度。
wcventure
Insert text
已知这种最坏情况的行为会占用整个网站,禁用/绕过Web应用程序防火墙(WAF),或仅通过执行哈希表插入来保持数千个CPU忙碌
wcventure
Insert text
尽管存在潜在的严重性,但在实践中,以域独立的方式检测算法复杂性漏洞是一个困难的,多方面的问题。 完全放弃具有高最坏情况复杂性的算法或数据结构通常是不可行的,而不会严重限制应用程序的功能或向后兼容性。 实际应用的手动时间复杂度分析难以扩展。 此外,渐近复杂性分析忽略了可以显着影响应用程序执行时间的常数因素,尽管不会影响整体复杂性类。 所有这些因素都显着加强了算法复杂性漏洞的检测
wcventure
Insert text
即使现实世界的应用程序使用易于理解的算法,时间复杂度分析仍然是非常重要的,原因如下。 首先,即使有较小的实现变化,时间/空间复杂度分析也会发生显着变化(例如,快速排序算法中枢轴的选择会极大地影响其最坏情况的运行时行为)。 推理这些变化的影响需要大量的人工努力。 其次,大多数实际应用程序通常具有多个互连组件,这些组件以复杂的方式进行交互。 即使在很好地理解各个组件的时间复杂度时,这种互连还使整体复杂性的估计复杂化。
wcventure
Insert text
大多数现有的算法复杂性漏洞检测机制都使用特定于域的和特定于实现的启发式或规则
wcventure
Insert text
在正则表达式匹配期间检测过多的回溯
Page 2: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

to be brittle and are hard to scale to a large number of di-verse domains, since their creation and maintenance requiressignicant manual eort and expertise. Moreover, keepingsuch rules up-to-date with newer software versions is onerous,as even minor changes to the implementation might requiresignicant changes in the rules.

In this work, we design, implement, and evaluate a noveldynamic domain-independent approach for automatically nd-ing inputs that trigger worst-case algorithmic complexity vul-nerabilities in tested applications. In particular, we introduceSlowFuzz, an evolutionary-search-based framework that canautomatically nd inputs to maximize resource utilization (in-struction count, memory usage etc.) for a given test binary.SlowFuzz is fully automated and does not require any man-ual guidance or domain-specic rules. The key idea behindSlowFuzz is that the problem of nding algorithmic complex-ity vulnerabilities can be posed as an optimization problemwhose goal is to nd an input that maximizes resource uti-lization of a target application. We develop an evolutionarysearch technique specically designed to nd solutions forthis optimization problem.

We evaluate SlowFuzz on a variety of real world applica-tions, including the PCRE library for regular expression match-ing [18], the bzip2 compression/decompression utility, as wellas the hash table implementation of PHP. We demonstrate thatSlowFuzz can successfully generate inputs that trigger com-plexity vulnerabilities in all the above contexts. Particularly,we show that SlowFuzz generates inputs that achieve a 300-times slowdown when decompressed by the bzip2 utility, canproduce regular expressions that exhibit matching times expo-nential in the input’s size, and also manages to automaticallygenerate inputs that trigger a high number of collisions inreal-world PHP applications. We also demonstrate that ourevolutionary guidance scheme achieves more than 100% im-provement over code coverage at steering input generationtowards triggering complexity vulnerabilities.

In summary, this work makes the following contributions:

• We present SlowFuzz, the rst, to the best of our knowl-edge, domain-independent dynamic testing tool for auto-matically nding algorithmic complexity vulnerabilitieswithout any manual guidance.• We design an evolutionary guidance engine with novel

mutation schemes particularly tted towards generatinginputs that trigger worst-case resource usage behaviors ina given application. Our scheme achieves more than 100%improvement over code-coverage-guided input generationat nding such inputs.

• We evaluate SlowFuzz on a variety of complex real-worldapplications and demonstrate its ecacy at detecting com-plexity vulnerabilities in diverse domains including largereal-world software like the bzip2 utility and the PCREregular expression library.

The rest of the paper is organized as follows. We providea high-level overview of SlowFuzz’s inner workings with amotivating example in Section 2. We describe the details of ourmethodology in Section 3. The implementation of SlowFuzz isdescribed in Section 4 and the evaluation results are presentedin Section 5. Section 6 outlines the limitations of our current

prototype and discusses possible future extensions. Finally, wediscuss related work in Section 7 and conclude in Section 8.

2 OVERVIEW

2.1 Problem Description

In this paper, we detect algorithmic complexity vulnerabilitiesin a given application by detecting inputs that cause largevariations in resource utilization through the number of ex-ecuted instructions or CPU usage for all inputs of a givensize. We assume that our tool has gray-box access to the ap-plication binary, i.e., it can instrument the binary in order toharvest dierent ne-grained resource usage information frommultiple runs of the binary, with dierent inputs. Note thatour goal is not to estimate the asymptotic complexities of theunderlying algorithms or data structures of the application.Instead, we measure the resource usage variation in some pre-dened metric like the total edges accessed during a run, andtry to maximize that metric. Even though, in most cases, theinputs causing worst-case behaviors under such metrics willbe the ones demonstrating the actual worst-case asymptotic

behaviors, but this may not always be true due to the constantfactors ignored in the asymptotic time complexity, the smallinput sizes, etc.Threat model. Our threat model assumes that an attackercan provide arbitrary specially-crafted inputs to the vulner-able software to trigger worst-case behaviors. This is a veryrealistic threat-model as most non-trivial real-world softwarelike Web applications and regular expression matchers needto deal with inputs from untrusted sources. For a subset of ourexperiments involving regular expression matching, we as-sume that attackers can control regular expressions providedto the matchers. This is a valid assumption for a large set ofapplications that provide search functionality through customregular expressions from untrusted users.

2.2 A Motivating Example

In order to understand how our technique works, let us con-sider quicksort, one of the simplest yet most widely used sort-ing algorithms. It is well-known [30] that quicksort has anaverage time complexity ofO(nloдn) but a worst-case complex-ity of O(n2) where n is the size of the input. However, ndingan actual input that demonstrates the worst-case behavior ina particular quicksort implementation depends on low-leveldetails like the pivot selection mechanism. If an adversaryknows the actual pivot selection scheme used by the imple-mentation, she can use domain-specic rules to nd an inputthat will trigger the worst-case behavior (e.g., the quadratictime complexity) [40].

However, in our setting, SlowFuzz does not know anydomain-specic rules. It also does not understand the seman-tics of pivot selection or which part of the code implements thepivot selection logic, even though it has access to the quicksortimplementation. We would still like SlowFuzz to generateinputs that trigger the corresponding worst-case behavior andidentify the algorithmic complexity vulnerability.

2

wcventure
Insert text
然而,这些规则往往很脆弱,难以扩展到大量不同的领域,因为它们的创建和维护需要大量的人工努力和专业知识。 此外,将这些规则与更新的软件版本保持同步是繁重的,因为即使对实现进行微小的更改也可能需要对规则进行重大更改。
wcventure
Insert text
在这项工作中,我们设计,实现和评估一种新颖的动态域独立方法,用于自动查找在测试应用程序中触发最坏情况算法复杂性漏洞的输入。 特别是,我们介绍了SlowFuzz,这是一个基于进化搜索的框架,可以自动查找输入,以最大限度地提高给定测试二进制的资源利用率(指令数,内存使用量等)。 SlowFuzz是完全自动化的,不需要任何手动指导或特定于域的规则。 SlowFuzz背后的关键思想是,找到算法复杂性漏洞的问题可以作为优化问题提出,其目标是找到最大化目标应用程序资源利用率的输入。 我们开发了一种专门用于寻找此优化问题解决方案的进化搜索技术
wcventure
Insert text
我们在各种实际应用程序上评估SlowFuzz,包括用于正则表达式匹配的PCRE库,bzip2压缩/解压缩实用程序,以及PHP的哈希表实现。 我们证明SlowFuzz可以成功生成输入,在上述所有环境中触发复杂性漏洞。 特别是,我们展示了SlowFuzz生成的输入在通过bzip2实用程序解压缩时实现300倍减速,可以生成在输入大小中呈现指数匹配时间的正则表达式,并且还设法自动生成触发大量冲突的输入 在现实世界的PHP应用程序中。 我们还证明,我们的演化指导方案在引导输入生成的代码覆盖率方面实现了超过100%的改进,从而触发复杂性漏洞。
wcventure
Insert text
我们首先介绍SlowFuzz,这是我们所知的,与领域无关的动态测试工具,可自动查找算法复杂性漏洞,无需任何手动指导。
wcventure
Insert text
我们设计了一种具有新颖突变方案的进化引导引擎,特别适用于生成在给定应用中触发最坏情况资源使用行为的输入。 在寻找此类输入时,我们的方案在代码覆盖率引导的输入生成方面实现了超过100%的改进
wcventure
Insert text
我们在各种复杂的实际应用程序上评估SlowFuzz,并展示其在检测各种领域复杂性漏洞方面的功效,包括bzip2实用程序和PCRE正则表达式库等大型实际软件。
wcventure
Insert text
在本文中,我们通过检测输入来检测给定应用程序中的算法复杂性漏洞,这些输入通过执行指令的数量或给定大小的所有输入的CPU使用量导致资源利用率的大的变化。
wcventure
Insert text
我们假设我们的工具具有对应用程序二进制文件的灰盒访问,即,它可以检测二进制文件,以便从具有不同输入的二进制文件的多次运行中获取不同的细粒度资源使用信息。
wcventure
Insert text
请注意,我们的目标不是估计应用程序的基础算法或数据结构的渐近复杂性。 相反,我们测量某些预定义指标中的资源使用变化,例如在运行期间访问的总边数,并尝试最大化该指标。
wcventure
Insert text
尽管在大多数情况下,在这些指标下导致最坏情况行为的输入将是证明实际最坏情况渐近行为的输入,但由于在渐近时间复杂度中忽略的常数因子,这可能并不总是正确的, 小输入尺寸等
wcventure
Insert text
我们的威胁模型假定攻击者可以向易受攻击的软件提供任意特制的输入,以触发最坏情况的行为。 这是一个非常现实的威胁模型,因为大多数非平凡的现实世界软件,如Web应用程序和正则表达式匹配器需要处理来自不可信来源的输入。 对于涉及正则表达式匹配的实验的子集,我们假设攻击者可以控制提供给匹配器的正则表达式。 对于通过来自不受信任的用户的自定义正则表达式提供搜索功能的大量应用程序,这是一个有效的假设。
wcventure
Insert text
为了理解我们的技术是如何工作的,让我们考虑一下最简单但最广泛使用的排序算法。
wcventure
Insert text
但是,找到一个实际输入来演示特定快速排序实现中的最坏情况行为取决于像枢轴选择机制这样的低级细节
wcventure
Insert text
如果对手知道实现使用的实际枢轴选择方案,则她可以使用特定于域的规则来查找将触发最坏情况行为的输入(例如,二次时间复杂度)
wcventure
Insert text
但是,在我们的设置中,SlowFuzz不知道任何特定于域的规则。 它也不理解数据透视选择的语义或代码的哪一部分实现了数据透视选择逻辑,即使它可以访问快速排序实现。
wcventure
Insert text
我们仍然希望SlowFuzz生成输入,触发相应的最坏情况行为并识别算法复杂性漏洞
Page 3: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

This brings us to the following research question: howcan SlowFuzz automatically generate inputs that would trig-

ger worst-case performance in a tested binary in a domain-

independent manner? The search space of all inputs is toolarge to search exhaustively. Our key intuition in this paper isthat evolutionary search techniques can be used to iterativelynd inputs that are closer to triggering the worst-case behav-ior. Adopting an evolutionary testing approach, SlowFuzzbegins with a corpus of seed inputs, applies mutations to eachof the inputs in the corpus, and ranks each of the inputs basedon their resource usage patterns. SlowFuzz keeps the highestranked inputs for further mutations in upcoming generations.

To further illustrate this point, let us consider the pseu-docode of Figure 1, depicting a quicksort example with a sim-ple pivot selection scheme—the rst element of the array beingselected as the pivot. In this case, the worst-case behavior canbe elicited by an already sorted array. Let us also assume thatSlowFuzz’s initial corpus consists of some arrays of numbersand that none of them are completely sorted. Executing thisquicksort implementation with the seed arrays will result in adierent number of statements/instructions executed based onhow close each of these arrays are to being sorted. SlowFuzzwill assign a score to each of these inputs based on the numberof statements executed by the quicksort implementation foreach of the inputs. The inputs resulting in the highest numberof executed statements will be selected for further mutationto create the next generation of inputs. Therefore, each up-coming generation will have inputs that are closer to beingcompletely sorted than the inputs of the previous generations.

For example, let us assume the initial corpus for SlowFuzzconsists of a single array I = [8, 5, 3, 7, 9]. At each step, Slow-Fuzz picks at random an input from the corpus, mutates it,and passes the mutated input to the above quicksort imple-mentation while recording the number of executed statements.As shown in Figure 1, the input [8, 5, 3, 7, 9] results in theexecution of 37 lines of code (LOC). Let us assume that thisinput is mutated into [1, 5, 3, 7, 9] that causes the executionof 52 LOC which is higher than the original input and there-fore [1, 5, 3, 7, 9] is selected for further mutation. Eventually,SlowFuzz will nd a completely sorted array (e.g., [1, 5, 6, 7,9] as shown in Figure 1) that will demonstrate the worst-casequadratic behavior. We provide a more thorough analysis ofSlowFuzz’s performance on various sorting implementationsin Section 5.2.

3 METHODOLOGY

The key observation for our methodology is that evolutionarysearch techniques together with dynamic analysis present apromising approach for nding inputs that demonstrate worst-case complexity of a test application in a domain-independentway. However, to enable SlowFuzz to eciently nd suchinputs, we need to carefully design eective guidance mecha-nisms and mutation schemes to drive SlowFuzz’s input gen-eration process. We design a new evolutionary algorithm withcustomized guidance mechanisms and mutation schemes thatare tailored for nding inputs causing worst-case behavior.

Algorithm 1 shows the core evolutionary engine of Slow-Fuzz. Initially, SlowFuzz randomly selects an input to execute

1 function quicksort(array):

2 /* initialize three arrays to hold

3 elements smaller , equal and greater

4 than the pivot */

5 smaller , equal , greater = [], [], []

6 if len(array) <= 1:

7 return

8 pivot = array [0]

9 for x in array:

10 if x > pivot:

11 greater.append(x)

12 else if x == pivot:

13 equal.append(x)

14 else if x < pivot:

15 smaller.append(x)

16 quicksort(greater)

17 quicksort(smaller)

18 array = concat(smaller , equal , greater)

Quicksort Inputs

8 5 3 7 9

1 5 3 7 9

1 5 6 7 9

Number of executed

lines

37

52

67

Figure 1: Pseudocode for quicksort with a simple

pivot selectionmechanism and overview of SlowFuzz’s

evolutionary search process for nding inputs that

demonstrateworst-case quadratic time complexity. The

shaded boxes indicate mutated inputs.

from a given seed corpus (line 4), which is mutated (line 5)and passed as input to the test application (line 6). Duringan execution, proling info such as the dierent types of re-source usage of the application are recorded (lines 6-8). Aninput is scored based on its resource usage and is added to themutation corpus if the input is deemed as a slow unit (lines9-12).

In the following Sections, we describe the core componentsof SlowFuzz’s engine, particularly the tness function used todetermine whether an input is a slow unit or not, and the osetand type of mutations performed on each of the individualinputs in the corpus.

3.1 Fitness Functions

As shown in Algorithm 1, SlowFuzz determines, after eachexecution, whether the executed unit should be consideredfor further mutations (lines 9-12). SlowFuzz ranks the cur-rent inputs based on the scores assigned to them by a tnessfunction and keeps the ttest ones for further mutation. Popu-lar coverage-based tness functions which are often used byevolutionary fuzzers to detect crashes, are not well suited forour purpose as they do not consider loop iterations which arecrucial for detecting worst-case time complexity.

SlowFuzz’s input generation is guided by a tness functionbased on resource usage. Such a tness function is generic andcan take into consideration dierent kinds of resource usagelike CPU usage, energy, memory, etc. In order to measure the

3

wcventure
Insert text
这引出了以下研究问题:SlowFuzz如何自动生成输入,这些输入将以独立于域的方式触发测试二进制文件中的最坏情况性能?
wcventure
Insert text
所有输入的搜索空间太大,无法进行详尽的搜索。 我们在本文中的关键直觉是,可以使用进化搜索技术迭代地找到更接近触发最坏情况行为的输入。 SlowFuzz采用进化测试方法,以种子输入语料库开头,对语料库中的每个输入应用突变,并根据其资源使用模式对每个输入进行排名。 SlowFuzz为下一代的进一步突变保留了排名最高的输入
wcventure
Insert text
为了进一步说明这一点,让我们考虑图1的伪代码,描述一个简单的枢轴选择方案的快速排序示例 - 数组的第一个元素被选为枢轴。
wcventure
Insert text
在这种情况下,最坏情况的行为可以由已排序的数组引出。 我们还假设SlowFuzz的初始语料库由一些数字数组组成,并且它们都没有完全排序。
wcventure
Insert text
使用种子数组执行此快速排序实现将导致根据每个数组的排序距离执行不同数量的语句/指令。 SlowFuzz将根据每个输入的快速排序实现执行的语句数为每个输入分配一个分数。
wcventure
Insert text
将选择导致执行次数最多的输入以进一步突变以创建下一代输入。 因此,每个即将到来的一代将具有比前几代的输入更接近完全分类的输入。
wcventure
Insert text
对我们的方法的关键观察是,进化搜索技术与动态分析一起提供了一种有前途的方法,用于找到以独立于域的方式展示测试应用程序的最坏情况复杂性的输入。
wcventure
Insert text
但是,为了使SlowFuzz能够有效地找到这些输入,我们需要仔细设计有效的引导机制和变异方案,以推动SlowFuzz的输入生成过程。 我们设计了一种新的进化算法,该算法具有定制的引导机制和突变方案,专门用于查找导致最坏情况行为的输入
wcventure
Insert text
在下面的章节中,我们描述了SlowFuzz引擎的核心组件,特别是用于确定输入是否是慢速单元的tness函数,以及对语料库中每个单独输入执行的偏移和类型的突变。
wcventure
Insert text
SlowFuzz根据健身分配给他们的分数对当前输入进行排名功能和保持最适合进一步突变。
wcventure
Insert text
基于覆盖的流行适应度函数通常由进化模糊器用于检测崩溃,不适合我们的目的,因为它们不考虑对于检测最坏情况时间复杂度至关重要的循环迭代。
wcventure
Insert text
SlowFuzz的输入生成由基于资源使用的适应度函数引导。
Page 4: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

Algorithm 1 SlowFuzz: Report all slow units for applicationA after n generations, starting from a corpus I

1: procedure DiffTest(I, A, n, GlobalState)2: units = ∅ ;reported slowunits3: while дeneration ≤ n and I , ∅ do4: input = RandomChoice(I)5: mut_input =Mutate(input )6: app_insn,app_outputs = Run(A,mut_input )7: дen_insn ∪ = app_insn8: дen_usaдe ∪ = app_usaдe9: if SlowUnit(дen_insn,дen_usaдe ,

GlobalState) then10: I ← I ∪mut_input11: units ∪ =mut_input12: end if

13: дeneration = дeneration + 114: end while

15: return units16: end procedure

CPU usage in a ne-grained way, SlowFuzz’s tness func-tion keeps track of the total count of all instructions executedduring a run of a test program. The intuition is that the test pro-gram becomes slower as the number of executed instructionsincreases. Therefore, the tness function selects the inputsthat result in the highest number of executed instructions asthe slowest units. For eciency, we monitor execution at thebasic-block level instead of instructions while counting thetotal number of executed instructions for a program. We foundthat this method is more eective at guiding input generationthan directly using the time taken by the test program to run.The runtime of a program shows large variations, dependingon the application’s concurrency characteristics or other pro-grams that are executing in the same CPU, and therefore isnot a reliable indicator for small increases in CPU usage.

3.2 Mutation Strategy

SlowFuzz introduces several new mutation strategies tailoredto identify inputs that demonstrate the worst-case complexityof a program. A mutation strategy decides which mutationoperations to apply and which byte osets in an input tomodify, to generate a new mutated input (Algorithm 1, line 5).

SlowFuzz supports the following mutation operations: (i)add/remove a new/existing byte from the input; ii) randomlymodify a bit/byte in the input; iii) randomly change the order ofa subset of the input bytes; iv) randomly change bytes whosevalues are within the range of ASCII codes for digits (i.e.,0x30-0x39); v) perform a crossover operation in a given buermixing dierent parts of the input; and vi) mutate bytes solelyusing characters or strings from a user-provided dictionary.

We describe the dierent mutation strategies supported bySlowFuzz below. Section 5.6 presents a detailed performancecomparison of these strategies.RandomMutations. Random mutations are the simplest mu-tation strategy supported by SlowFuzz. Under this mutationstrategy, one of the aforementioned mutations is selected at

random and is applied on an input, as long as it does not vi-olate other constraints for the given testing session, such asexceeding the maximum input length specied by the auditor.This strategy is similar to the ones used by popular evolution-ary fuzzers like AFL [58] and libFuzzer [14] for nding crashesor memory safety issues.Mutation priority. Under this strategy, the mutation oper-ation is selected with ϵ probability based on its success atproducing slow units during previous executions. The muta-tion operation is picked at random with (1 − ϵ) probability. Incontrast, the mutation oset is still selected at random justlike the strategy described above.

In particular, during testing, we count all the cases in whicha mutation operation resulted in an increase in the observedinstruction count and the number of times that operation hasbeen selected. Based on these values, we assign a score to eachmutation operation denoting the probability of the mutation tobe successful at increasing the instruction count. For example,a score of 0 denotes that the mutation operation has neverresulted in an increase in the number of executed instructions,whereas a score of 1 denotes that the mutation always resultedin an increase.

We pick the highest-scoring mutation among all mutationoperations with a probability ϵ . The tunable parameter ϵ de-termines how often a mutation operation will be selected atrandom versus based on its score. Essentially, dierent val-ues of ϵ provide dierent trade-os between exploration andexploitation. In SlowFuzz, we set the default value of ϵ to 0.5.Oset priority. This strategy selects the mutation operationto be applied randomly at each step, but the oset to be mu-tated is selected based on prior history of success at increasingthe number of executed instructions. The mutation oset isselected based on the results of previous executions with aprobability ϵ and at random with a probability (1 − ϵ). In therst case, we select the oset that showed the most promisebased on previous executions (each oset is given a score rang-ing from 0 to 1 denoting the percentage of times in which themutation of that oset led to an increase in the number ofinstructions).Hybrid. In this last mode of operation we apply a combinationof both mutation and oset priority as described above. Foreach oset, we maintain an array of probabilities of successfor each of the mutation operations that are being performed.Instead of maintaining a coarse-grained success probabilityfor each mutation in the mutation priority strategy, we main-tain ne-grained success probabilities for each oset/mutationoperation pairs. We compute the score of each oset by com-puting the average of success probabilities of all mutationoperations at that oset. During each mutation, with a proba-bility of ϵ , we pick the oset and operation with the highestscores. The mutation oset and operation are also picked ran-domly with a probability of (1 − ϵ).

4 IMPLEMENTATION

The SlowFuzz prototype is built on top of libFuzzer [14], apopular evolutionary fuzzer for nding crash and memorysafety bugs. We outline the implementation details of dierentcomponents of SlowFuzz below. Overall, our modications

4

wcventure
Insert text
这样的适应度函数是通用的,可以考虑不同类型的资源使用,如CPU使用,能量,内存等。为了以细粒度的方式测量CPU使用率,SlowFuzz的适应度函数跟踪总计数 在测试程序运行期间执行的所有指令。
wcventure
Insert text
直觉是测试程序随着执行指令数量的增加而变慢。 因此,适应度函数选择导致执行最多数量的指令的输入作为最慢单位
wcventure
Insert text
为了提高效率,我们监控基本块级别的执行而不是指令,同时计算程序执行指令的总数。
wcventure
Insert text
我们发现这种方法在引导输入生成方面比直接使用测试程序运行所花费的时间更有效。 程序的运行时显示很大的变化,具体取决于应用程序的并发特性或在同一CPU中执行的其他程序,因此不是CPU使用量小幅增加的可靠指标。
wcventure
Insert text
SlowFuzz引入了几种新的突变策略,用于识别显示程序最坏情况复杂性的输入。 突变策略决定应用哪些突变操作以及要修改的输入中的哪些字节偏移,以生成新的突变输入(算法1,第5行)。
wcventure
Insert text
SlowFuzz支持以下变异操作:(i)从输入添加/删除新的/现有的字节; ii)随机修改输入中的位/字节; iii)随机改变输入字节子集的顺序; iv)随机改变其值在数字的ASCII码范围内的字节(即0x30-0x39); v)在给定缓冲器中执行交叉操作,混合输入的不同部分; 和vi)仅使用来自用户提供的字典的字符或字符串来改变字节。
wcventure
Insert text
我们将在下面描述SlowFuzz支持的不同变异策略。 第5.6节介绍了这些策略的详细性能比较
wcventure
Insert text
随机突变是SlowFuzz支持的最简单的突变策略。 在此突变策略下,上述突变之一随机选择并应用于输入,只要它不违反给定测试会话的其他约束,例如超过审计员指定的最大输入长度。 这种策略类似于流行的进化模糊器(如AFL和libFuzzer)用于查找崩溃或内存安全问题的策略。
wcventure
Insert text
在该策略下,基于其在先前执行期间产生慢单位的成功,以ε概率选择变异操作。 以“1-ε”概率随机挑选变异操作。 相反,突变偏移仍然是随机选择的,就像上述策略一样
wcventure
Insert text
特别是,在测试期间,我们计算所有突变操作导致观察指令数增加和操作选择次数的情况。 基于这些值,我们为每个突变操作分配一个分数,表示突变在增加指令数量时成功的概率。 例如,得分为0表示突变操作从未导致执行指令数量的增加,而得分为1表示突变总是导致增加。
wcventure
Insert text
我们在所有突变操作中以概率ε选择得分最高的突变。 可调参数ε确定随机选择变异操作的频率与基于其得分的频率。 从本质上讲,ε的不同值在勘探和开发之间提供了不同的权衡。 在SlowFuzz中,我们将ε的默认值设置为0.5。
wcventure
Insert text
该策略选择在每个步骤随机应用的变异操作,但是在增加执行指令的数量的基础上,基于先前的成功历史来选择要变异的偏移。 基于先前执行的结果以概率ε并且随机地以概率“1-ε”选择突变偏移。 在第一种情况下,我们选择基于先前执行显示最大承诺的偏移量(每个偏移量的分数范围从0到1,表示该偏移量的变异导致数量增加的次数的百分比说明)。
wcventure
Insert text
在这最后的操作模式中,我们应用如上所述的突变和偏移优先级的组合。 对于每个偏移量,我们为正在执行的每个变异操作维护一系列成功概率。 我们不是为突变优先级策略中的每个突变保持粗粒度成功概率,而是为每个偏移/突变操作对保持细粒度的成功概率。 我们通过计算该偏移处的所有变异操作的成功概率的平均值来计算每个偏移的得分。 在每个突变期间,以ε的概率,我们选择具有最高分数的偏移和操作。 突变偏移和操作也以“1-ε”的概率随机挑选。
wcventure
Insert text
SlowFuzz原型建立在libFuzzer之上,libFuzzer是一种流行的进化模糊器,用于查找崩溃和内存安全漏洞。
Page 5: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

Input Mutation

Corpus Refinement

SlowFuzz

ApplicationAddress Space Resource Usage Info

Guidance EnginesInitialSeeds

Active Corpus

Figure 2: SlowFuzz architecture.

to libFuzzer consist of 550 lines of C++ code. We used Clangv4.0 for compiling our modications along with the rest oflibFuzzer code.

Figure 2 shows SlowFuzz’s high-level architecture. Sim-ilar to the popular evolutionary fuzzers like AFL [58] andlibFuzzer [14], SlowFuzz executes in the same address spaceas the application being tested. We instrument the test appli-cation so that SlowFuzz can have access to dierent resourceusage metrics (e.g, number of instructions executed) needed forits analysis. The instrumented test application subsequentlyis executed under the control of SlowFuzz’s analysis engine.SlowFuzz maintains an active corpus of inputs to be passedinto the tested applications and renes the corpus during exe-cution based on SlowFuzz’s tness function. For each genera-tion, an input is selected, mutated, then passed into the mainroutine of the application for its execution.Instrumentation. Similar to libFuzzer, SlowFuzz’s instru-mentation is based on Clang’s SanitizerCoverage [21] passes.Particularly, SanitizerCoverage allows tracking of each ex-ecuted function, basic block, and edge in the Control FlowGraph (CFG). It also allows us to register callbacks for eachof these events. SlowFuzz makes use of SanitizerCoverage’seight bit counter capability that maps each Control Flow Graph(CFG) edge into an eight bit counter representing the numberof times that edge was accessed during an execution. We usethe counter to keep track of the following ranges: 1, 2, 3, 4-7,8-15, 16-31, 32-127, 128+. This provides a balance betweenaccuracy of the counts and the overhead incurred for main-taining them. This information is then passed into SlowFuzz’stness function, which determines whether an input is slowenough to keep for the next generation of mutations.Mutations. LibFuzzer provides API support for custom in-put mutations. However, in order to implement the mutationstrategies proposed in Section 3.2, we had to modify libFuzzerinternals. Particularly, we augment the functions used in lib-Fuzzer’s Mutator class to return information on the mutationoperation, oset, and the range of aected bytes for each newinput generated by LibFuzzer. This information is used to com-pute the scores necessary for supporting mutation piority,oset priority, and hybrid modes as described in Section 3.2without any additional runtime overhead.

5 EVALUATION

In this Section, we evaluate SlowFuzz on the following objec-tives: a) Is SlowFuzz capable of generating inputs that matchthe theoretical worst-case complexity for a given algorithm’s

implementation? b) Is SlowFuzz capable of eciently nd-ing inputs that cause performance slowdowns in real-worldapplications? c) How do the dierent mutation and guidanceengines of SlowFuzz aect its performance? d) How doesSlowFuzz compare with code-coverage-guided search at nd-ing inputs demonstrating worst-case application behavior?

We describe the detailed results of our evaluation in thefollowing Sections. All our experiments were performed ona machine with 23GB of RAM, equipped with an Intel(R)Xeon(R) CPU X5550 @ 2.67GHz and running 64-bit Debian8 (jessie), compiled with GCC version 4.9.2, with a kernelversion 4.5.0. All binaries were compiled using the Clang-4.0compiler toolchain. All instruction counts and execution timesare measured using the Linux perf proler v3.16, averagingover 10 repetitions for each perf execution.

5.1 Overview

In order to adequately address the questions outlined in theprevious Section, we execute SlowFuzz on applications ofdierent algorithmic proles and evaluate its ability of gener-ating inputs that demonstrate worst case behavior.

First, we examine if SlowFuzz generates inputs that demon-strate the theoretical worst-case behavior of well-known al-gorithms. We apply SlowFuzz on sorting algorithms withwell-known complexities. The results are presented in Sec-tion 5.2. Subsequently, we apply SlowFuzz on dierent appli-cations and algorithms that have been known to be vulnerableto complexity attacks: the PCRE regular expression library,the default hash table implementation of PHP, and the bzip2binary. In all cases, we demonstrate that SlowFuzz is able totrigger complexity vulnerabilities. Table 1 shows a summaryof our ndings.

Tested Application Fuzzing Outcome

Insertion sort [30] 41.59x slowdownQuicksort (Fig 1) 5.12x slowdownApple quicksort 3.34x slowdownOpenBSD quicksort 3.30x slowdownNetBSD quicksort 8.7% slowdownGNU quicksort 26.36% slowdownPCRE (xed input) 78 exponential &

765 superlinear regexesPCRE (xed regex) 8% - 25% slowdownPHP hashtable 20 collisions in 64 keysbzip2 decompression ~300x slowdown

Table 1: Result Summary

As shown in Table 1, SlowFuzz is successful at inducingsignicant slowdown on all tested applications. Moreover,when applied to the PCRE library, it managed to generateregular expressions that exhibit exponential and super-linear(worse than quadratic) matching automatically, without anyknowledge of the structure of a regular expression. Likewise,it successfully generated inputs that induce a high number ofcollisions when inserted into a PHP hash table, without anynotion of hash functions. In the following Sections, we providedetails on each of the above test settings.

5

wcventure
Insert text
我们在下面概述了SlowFuzz的不同组件的实现细节。 总的来说,我们对libFuzzer的修改包含550行C ++代码。 我们使用Clang v4.0来编译我们的修改以及其余的libFuzzer代码。
wcventure
Insert text
图2显示了SlowFuzz的高级架构。 类似于AFL和libFuzzer等流行的进化模糊器,SlowFuzz在与被测试的应用程序相同的地址空间中执行。
wcventure
Insert text
我们对测试应用程序进行检测,以便SlowFuzz可以访问其分析所需的不同资源使用度量(例如,执行的指令数)。 仪表化测试应用程序随后在SlowFuzz分析引擎的控制下执行。 SlowFuzz维护一个活跃的输入语料库,以传递给测试的应用程序,并在执行期间根据SlowFuzz的适应度函数优化语料库。 对于每一代,选择一个输入,进行变异,然后将其传递到应用程序的主例程中以便执行。
wcventure
Insert text
与libFuzzer类似,SlowFuzz的仪器基于Clang的SanitizerCoverage通行证。 特别是,SanitizerCoverage允许跟踪控制流图(CFG)中的每个执行功能,基本块和边缘。 它还允许我们为每个事件注册回调。 SlowFuzz利用anitizerCoverage的8位计数器功能,将每个控制流图(CFG)边缘映射到8位计数器,表示执行期间边缘被访问的次数。 我们使用计数器来跟踪以下范围:1,2,3,4-7,8-15,16-31,32-127,128 +。 这在计数的准确性和维护它们的开销之间提供了平衡。 然后将此信息传递到SlowFuzz的适应度函数,该函数确定输入是否足够慢以保持下一代突变。
wcventure
Insert text
LibFuzzer为自定义输入突变提供API支持。 但是,为了实现3.2节中提出的变异策略,我们必须修改libFuzzer内部。 特别是,我们扩充了libFuzzer的Mutator类中使用的函数,以返回有关LibFuzzer生成的每个新输入的变异操作,偏移量和受影响字节范围的信息。 此信息用于计算支持变异优先级,偏移优先级和混合模式所需的分数,如第3.2节所述,无需任何额外的运行时开销
Page 6: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

5.2 Sorting

Simple quicksort and insertion sort. Our rst evaluationof SlowFuzz’s consistency with theoretical results is per-formed on common sorting algorithms with well-known worst-performing inputs. To this end, we initially apply SlowFuzzon an implementation of the insertion sort algorithm [30], aswell as on an implementation of quicksort [30] in which therst sub-array element is always selected as the pivot. Both ofthe above implementations demonstrate quadratic complexitywhen the input passed to them is sorted. We run SlowFuzzfor 1 million generations on the above implementations, sort-ing a le with a size of 64 bytes, and examine the slowdownSlowFuzz introduced over the fastest unit seen during testing.To do so, we count the total instructions executed by eachprogram for each of the inputs, subtracting all instructionsnot relevant to the quicksort functionality (e.g., loader code).Our results are presented in Figure 3.

0 200000 400000 600000 800000 10000001520253035404550

Slow

dow

n (is

ort)

Normalized slowdown over best performing input

SlowFuzzBrute-forceTheoretical Worst

0 500 1000 1500 2000 2500 3000Generation

1

2

3

4

5

6

7

Slow

dow

n (q

sort

)

SlowFuzzBrute-forceTheoretical Worst

Figure 3: Best slowdown achieved by SlowFuzz at each

generation (normalized over the slowdown of the best-

performing input) versus best random testing outcome,

on our insertion sort and quicksort drivers, for an in-

put size of 64 bytes (average of 100 runs). The SlowFuzz

achieves slowdowns of 84.97% and 83.74% compared to

the theoretical worst cases for insertion sort and quick-

sort respectively.

Figure 3 represents an average of 100 runs. In each run,SlowFuzz started execution with a single random 64 byteseed, and executed for 1 million generations. We notice thatSlowFuzz achieves 41.59x and 5.12x slowdowns for insertionsort and quicksort respectively. In order to examine how thisbehavior compares to random testing, we randomly generated1 million inputs of 64 bytes each and measured the instruc-tions required for insertion sort and quicksort, respectively.Figure 3 depicts the maximum slowdown achieved throughrandom testing across all runs. We notice that in both casesSlowFuzz outperforms the brute-force worst-input estimation.Finally, we observe that the gap between brute-force searchand SlowFuzz is much higher for quicksort than insertion,which is consistent with the fact that average case complexityof insertion sort is O(n2), compared to quicksort’s O(nloдn).

Therefore, a random input is more likely to demonstrate worst-case behavior for insertion sort but not for quicksort.Real-world quicksort implementations. We also exam-ined how SlowFuzz performs when applied to real-worldquicksort implementations. Particularly, we applied it to theApple [12], GNU [9], NetBSD [15], and OpenBSD [13] quick-sort implementations. We notice that SlowFuzz’s performanceon real world implementations is consistent with the quicksortperformance that we observed in the experiments describedabove. In particular, the slowdowns generated by SlowFuzzwere (in increasing order) 8.7%, for theNetBSD implementa-tion, 26.36% for the GNU quicksort implementation, 3.30x forthe OpenBSD implementation and 3.34x for the Apple imple-mentation. We notice that, despite the fact these implemen-tations use ecient pivot selection strategies, SlowFuzz stillmanages to trigger signicant slowdowns. On the contrary,repeating the same experiment using naive coverage-basedfuzzing yields slowdowns that never surpass 5% for any of thelibraries. This is an expected result, as coverage-based fuzzersare geared towards maximizing coverage, and thus do not fa-vor inputs exercising the same edges repeatedly over inputsthat discover new edges.

Finally, we note that, similar to the experiment of Figure 3,the slowdowns for Figure 4 are also measured in terms ofexecuted instructions, normalized over the instructions of thebest performing input seen during testing.

0 200000 400000 600000 800000 1000000Generation

1.0

1.5

2.0

2.5

3.0

3.5

Slow

dow

n

Normalized slowdown over best performing input

AppleOpenBSDNetBSDGNU

Figure 4: Best slowdown (with respect to the best-

performing input) achieved by SlowFuzz at each gener-

ation normalized over the best random testing outcome,

on real-world quicksort implementations, for an input

size of 64 bytes (average of 100 runs).

Result 1: SlowFuzz was able to generate inputs forquicksort and insertion sort that achieve 83.74% and84.97% of the theoretical worst-case, respectively with-out any information on the algorithm internals.

5.3 Regular Expressions

Regular expression implementations are known to be suscep-tible to complexity attacks [17, 20, 24]. In particular, there are

6

Page 7: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

over 150 Regular expression Denial of Service (ReDoS) vul-nerabilities registered in the National Vulnerability Database(NVD), which are the result of exponential (e.g., [8]) or super-linear (worse than quadratic) e.g., [7] complexity of regularexpression matching by several existing matchers [57].

Even performing domain-specic analyses of whether anapplication is susceptible to ReDoS attacks is non-trivial. Sev-eral works are solely dedicated to the detection of exploitationof such vulnerabilities. Recently, Rexploiter [57] presentedalgorithms to detect whether a given regular expression mayresult in non-deterministic nite automata (NFA) that requiresuper-linear or exponential matching times for specially craftedinputs. They have also presented domain-specic algorithmsto generate inputs capable of triggering such worst-case per-formance. The above denote the hardness of SlowFuzz’s task,namely nding regular expressions that may result in super-linear or exponential matching times without any domainknowledge.

0 20 40 60 80 100 120Number of instances

0.0

0.2

0.4

0.6

0.8

1.0

Prob

abili

ty

Probability of finding at least n units causing a slowdown

Slow UnitSuper-linearExponential

Figure 5: Probability of SlowFuzz nding at least nunique instances of regexes that cause a slowdown, or

exhibit super-linear and exponential matching times,

after 1million generations (inverse CDF over 100 runs).

For the regular expression setting we perform two separateexperiments to check whether SlowFuzz can produce i) reg-ular expressions which exhibit super-linear and exponentialmatching times, ii) inputs that cause slowdown during match-ing, given a xed regular expression. To this end, we applySlowFuzz on the PCRE regular expression library [18] andprovide it with a character set of the symbols used in PCRE-compliant regular expressions (in the form of a dictionary).Notice that we do not further guide SlowFuzz with respect towhat mutations should be done and SlowFuzz’s engine is com-pletely agnostic of the structure of a valid regular expression.In all cases, we start testing from an empty corpus withoutproviding any seeds of regular expressions to SlowFuzz.Fixed string and mutated regular expressions. For therst part of our evaluation, we apply SlowFuzz on a binarythat utilizes the PCRE library to perform regular expressionmatching and we let SlowFuzz mutate the regular expressionpart of the pcre2_match call used for the matching, usinga dictionary of regular expression characters. The input to

be matched against the regular expression is selected froma random pool of characters and SlowFuzz executes for atotal of 1 million generations, or until a time-out is hit. Theregular expressions generated by SlowFuzz are kept limitedto 10 characters or less. Once a SlowFuzz session ends, weevaluate the time complexity of the generated regular expres-sions utilizing Rexploiter [57], which detects if the regularexpression is super-linear, exponential, or none of the two. Werepeat the above process for a total of 100 fuzzing sessions.

Overall, SlowFuzz generates a total of 33343 regular expres-sions during the above 100 sessions, out of which 27142 arerejected as invalid whereas 6201 are valid regular expressionsthat caused a slowdown. Out of the valid regular expressions,765 are superlinear and 78 are exponential. This experimentdemonstrates that despite being agnostic of the semantics ofregex matching, SlowFuzz successfully generates regexes re-quiring super-linear and exponential matching times. Six suchexamples are presented in Table 2.

Super-linear (greater than quadratic) Exponentialc*ca*b*a*b (b+)+c

a+b+b+b+a+ c*(b+b)+cc*c+ccbc+ a(a|a*)+a

Table 2: Sample regexes generated by SlowFuzz result-

ing in super-linear (greater than quadratic) and expo-

nential matching complexity.

A detailed case study. The regexes presented in Table 2 aretypical examples of regular expressions that require non-linearmatching running times. This happens due to the existenceof dierent paths in the respective NFAs, which reach thesame state through an identical sequence of labels. Such pathshave a devastating eect during backtracking [57]. To furtherelaborate on this property, let us consider the NFA depicted inFigure 6, which corresponds to the regular expression (b+)+cof Table 2.

q0 q1 q2

b

b

b

c

Figure 6: NFA for the regular expression (b+)+c suer-

ing from exponential matching complexity as found by

SlowFuzz. q0 is the entry state, q2 the accept state, and

q1 the pivot state for the exponential backtracking.

We notice that, for the NFA shown in Figure 6, startingfrom state q1, it is possible to reach q1 again, through two

dierent paths, namely the paths (q1b−→ q0,q0

b−→ q1) and

(q1b−→ q1,q1

b−→ q1). Moreover, we notice that the labels inthe transitions for both of the above paths are the same: ’bb’is consumed in both cases. Thus, as it is possible to reach q2from q1 (via label c) as well as reach q1 from the initial stateq0, there will be an exponentially large number of paths to

7

Page 8: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

consider in the case of backtracking. Similar issues arise withloops appearing in NFAs with super-linear matching [57].

As mentioned above, on average, among the valid regularexpressions generated by SlowFuzz, approximately 12.33% ofthe regexes have super-linear matching complexity, whereas2.29% on average have exponential matching complexity. Theaforementioned results are aggregates across all the 100 exe-cutions of the experiment. In order to estimate the probabilityof SlowFuzz to generate a regex that exhibits a slowdown 1,or super-linear and exponential matching times in a single ses-sion, we calculate the respective inverse CDF which is shownin Figure 5. We notice that, for all the regular expressions ob-served, SlowFuzz successfully generates inputs that incur aslowdown during matching. In particular, with 90% probability,SlowFuzz generates at least 2 regular expressions requiringsuper-linear matching time and at least 31 regular expressionsthat cause a slowdown. SlowFuzz generates at least one regexrequiring exponential matching time with a probability of45.45% .Fixed regular expression and mutated string. In the sec-ond part of our evaluation of SlowFuzz on regular expres-sions, we seek to examine if, for a given xed regular expres-sion, SlowFuzz is able to generate inputs that can introducea slowdown during matching. We collect PCRE-compliantregular expressions from popular Web Application Firewalls(WAF) [2], and utilized the PCRE library to match input stringsgenerated by SlowFuzz against each regular expression. Forthis experiment, we apply SlowFuzz on a total of 25 regularexpressions, and we record the total instructions executedby the PCRE library when matching the regular expressionagainst SlowFuzz’s generated units, at each generation. Forour set of regular expressions, SlowFuzz achieved monotoni-cally increasing slowdowns, ranging from 8% to 25%. Figure 7presents how the slowdown varies as fuzzing progresses, forthree representative regex samples with dierent slowdownpatterns.

5.4 Hash Tables

Hash tables are a core data structure in a wide variety of soft-ware. The performance of hash table lookup and insertionoperations signicantly aects the overall application perfor-mance. Complexity attacks against hash table implementa-tions may induce unwanted eects ranging from performanceslowdowns to full-edged DoS [8, 17, 19, 20, 24]. In order toevaluate if SlowFuzz can generate inputs that trigger colli-sions without knowing any details about the underlying hashfunctions, we apply it on the hash table implementation ofPHP (v5.6.26), which is known to be vulnerable to collisionattacks.PHP Hashtables. Hashtables are prevalent in PHP and theyalso serve as the backbone for PHP’s array interface. PHP v5.xutilizes the DJBX33A hash function for hashing using stringkeys, which can bee seen in Listing 1.

We notice that for two strings of the form ‘ab’ and ‘cd’ tocollide, the following property must hold [10]:

c = a + n ∧ d = b − 33 ∗ n,n ∈ Z

1Notice that due to SlowFuzz’s guidance engine, any regex produced mustexhibit increased instruction count as compared to all previous regexes.

0 10000 20000 30000 40000 50000Generation

1.025

1.050

1.075

1.100

1.125

1.150

1.175

1.200

Slow

down

Normalized slowdown over best performing input

Regex 1Regex 2Regex 3

Figure 7: Best slowdown achieved by SlowFuzz-

generated input strings (normalized over the slowdown

of the best-performing input), when matching against

xed regular expressions used in WAFs (normalized

against best performing input over an average of

100 runs). The corresponding regexes are listed in

Appendix A.

It is also easy to show that if two equal-length strings A and Bcollide, then strings xAy, xBy where x and y are any prex andsux respectively, also collide. Using the above property, onecan construct a worst-case performing sequence of inputs [3],forcing a worst-case insertion time of O(n2).

1 /*

2 * @arKey is the array key to be hashed

3 * @nKeyLenth is the length of arKey

4 */

5 static inline ulong

6 zend_inline_hash_func(const char *arKey , uint

nKeyLength)

7

8 register ulong hash = 5381;

9

10 for (uint i = 0; i < nKeyLength; ++i)

11 hash = ((hash << 5) + hash) + arKey[i];

12

13

14 return hash;

15

Listing 1: DJBX33A hash without loop unrolling.

Abusing the complexity characteristics of the BJBX33A hash,attackers performed DoS attacks against PHP, Python andRuby applications in 2011. As a response, PHP added an op-tion in its ini conguration to set a limit on the number ofcollisions that are allowed to happen. However, in 2015, similarDoS attacks [1] were reported, abusing PHP’s JSON parsinginto hash tables. In this experiment we examine how Slow-Fuzz performs when applied to this particular hash functionimplementation.

Our experimental setup is as follows: we ported the PHPhash table implementation so that the latter can be used in anyC/C++ implementation, removing all the interpreter-specicvariables and macros, however leaving all the non interpreter-related components intact. Subsequently, we created a hash

8

Page 9: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

0 5 10 15 20 25 30 35Fuzzing time (hours)

8

10

12

14

16

18

20

Num

ber o

f col

lisio

nsNumber of collisions found by SlowFuzz

Figure 8: Number of collisions found by SlowFuzz per

generation, when applying it on the PHP 5.6 hashtable

impelemntation, for atmost of 64 insertionswith string

keys.

table with a size of 64 entries, and utilized SlowFuzz to per-form a maximum of 64 insertions to the hash table, usingstrings as keys, starting from a corpus consisting of a singleinput that causes 8 collisions. In particular, the keys for thehash table insertions were provided by SlowFuzz at each gen-eration and SlowFuzz evolved its corpus of strings using ahybrid mutation strategy. Given a hash table of 64 entries and64 insertions to the hash table, the maximum number of col-lisions that can be performed is also 64. In order to measurethe number of collisions occurring in the hashtable at eachgeneration, we created a PHP module (running in the contextof PHP), and measured the number of collisions induced byeach input that SlowFuzz generates. We perform our mea-surements after the respective elements are inserted into areal PHP array. Our results are presented in Figure 8.

We notice that despite the complex conditions required totrigger a hash collision and without knowing any details aboutthe hash function, SlowFuzz’s evolutionary engine reaches31.25% of the theoretical worst-case after approximately 40hours of fuzzing, using a single CPU. SlowFuzz’s stateful, evo-lutionary guidance achieves monotonically increasing slow-downs, despite the complex constraints imposed by the hashfunction. On the contrary, repeating the same experimentusing coverage-based fuzzing, yielded non-monotonically in-creasing collisions, and at no point an input was generatedcausing more than 8 collisions. In particular, fuzzing usingcoverage generated 58 inputs with a median of 5 collisions.

5.5 ZIP Utilities

Zip utilities that support various compression/decompressionschemes are another instance of applications that have beenshown to suer from Denial of Service attacks. For instance,an algorithmic complexity vulnerability used in the sortingalgorithms in the bzip2 application 2 allowed remote attackers

2The vulnerability is found in BZip2CompressorOutputStream for ApacheCommons Compress before 1.4.1

to cause DoS via increased CPU consumption, when theyprovided a le with many repeating inputs [16].

In order to evaluate how SlowFuzz performs when appliedto the compression/decompression libraries, we apply it onbzip2 v1.0.6. In particular, we utilize SlowFuzz to create com-pressed les of a maximum of 250 bytes, and we subsequentlyuse the libbzip2 library to decompress them. Based on the slow-downs observed during decompression, SlowFuzz evolves itsinput corpus, mutating each input using its hybrid mode ofoperation. Our experimental results are presented in Figure 9.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5Fuzzing time (hours)

0.00

0.05

0.10

0.15

0.20

0.25

Deco

mpr

essi

on ti

me

(sec

onds

)

Decompression times for bzip files generated by SlowFuzz

Figure 9: Slowdowns observed while decompressing in-

puts generated by SlowFuzz using the bzip2 binary.

The maximum le size is set to 250 bytes.

A detailed case study. Figure 9 depicts the time required bythe bzip2 binary to decompress each of the inputs generatedby SlowFuzz. We notice that for the rst hour of fuzzing, theinputs generated by SlowFuzz do not exhibit signicant slow-down during their decompression by bzip2. In particular, eachof the 250-byte inputs of SlowFuzz’s corpus for the rst hourof fuzzing is decompressed in approximately 0.0006 seconds.However, in upcoming generations, we observe that SlowFuzzsuccessfully achieves decompression times reaching 0.18s to0.21s and an overall slowdown in the range of 300x. Partic-ularly, in the rst 6 minutes after the rst hour, SlowFuzzachieves a decompression time of 0.10 sec. This rst peak inthe decompression time is achieved because of SlowFuzz trig-gering the randomization mechanism of bzip2, by setting therespective header byte to a non-zero value. This mechanism,although deprecated, was put in place to protect against repet-itive blocks, and is still supported for legacy reasons. However,even greater slowdowns are achieved when SlowFuzzmutatestwo bytes used in bzip2’s Move to Front Transform (MTF) [4]and particularly in the run length encoding of the MTF result.Specically, the mutation of these bytes aects the total num-ber of invocations of the BZ2_bzDecompress routine, whichresults in a total slowdown of 38.31x in decompression time.

The respective code snippet in which the aected bytesare read is shown in Listing 2: the GET_MTF_VAL macro readsthe modied bytes in memory 3. These bytes subsequently3Via the macros GET_BITS(BZ_X_MTF_3, zvec, zn) andGET_BIT(BZ_X_MTF_4, zj)

9

Page 10: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

cause the routine BZ2_bzDecompress to be called 4845 times,contrary to a single call before the mutation. We should noteat this point, that the total size of the input before and afterthe mutation remained unchanged.

Finally, in order to compare with a non complexity-targetingstrategy, we repeated the previous experiment using tradi-tional coverage-based fuzzing. The fuzzer, when guided onlybased on coverage, did not generate any input causing execu-tions larger than 0.0008 seconds, with the maximum slowdownachieved being 23.7%.

1 do

2 /* Check that N doesn 't get too big , so that

3 es doesn't go negative. The maximum value

4 that can be RUNA/RUNB encoded is equal

5 to the block size (post the initial RLE),

6 viz , 900k, so bounding N at 2 million

7 should guard against overflow without

8 rejecting any legitimate inputs. */

9 if (N >= 2*1024*1024) RETURN(BZ_DATA_ERROR);

10 if (nextSym == BZ_RUNA) es = es + (0+1) * N;

else

11 if (nextSym == BZ_RUNB) es = es + (1+1) * N;

12 N = N * 2;

13 GET_MTF_VAL(BZ_X_MTF_3 , BZ_X_MTF_4 , nextSym);

14

15 while (nextSym == BZ_RUNA || nextSym ==

BZ_RUNB);

Listing 2: Excerpt from bzip2’s BZ2_decompress

routine (decompress.c). A two byte modication by

SlowFuzz results in a 38.31x slowdown compared to

the previous input.

From the above experiment we observe that SlowFuzz’sguidance and mutations engines are successful in pinpointinglocations that trigger large slowdowns even in very complexapplications such as a state-of-the-art compression utility likebzip2.

Result 2: SlowFuzz is capable of exposing complex-ity vulnerabilities (e.g., 300x slowdown in bzip2, PCRE-compliant regular expressions with exponential match-ing time, and PHP hash table collisions) in real-world,non-trivial applications without knowing any domain-specic details.

5.6 Engine Evaluation

Eect of SlowFuzz’s tness function. In this section, weexamine the eect of using code-coverage-guided search ver-sus SlowFuzz’s resource usage based tness function, particu-larly in the context of scanning an application for complexityvulnerabilities. To do so, we repeat one of the experiments ofSection 5.2, applying SlowFuzz on the OpenBSD quicksortimplementation with an input size of 64 bytes, for a total of1 million generations, using hybrid mutations. Our resultsare presented in Figure 10. We observe that SlowFuzz’s guid-ance mechanism yields signicant improvement over code-coverage-guided search. In particular, SlowFuzz achieves a3.3x slowdown for OpenBSD, whereas the respective slow-down achieved using only coverage-guided search is 23.41%.This is an expected result, since, as mentioned in previous

Sections, code coverage cannot encapsulate behaviors result-ing in multiple invocations of the same line of code (e.g., aninnite loop). Moreover, we notice that the total instructionsof each unit that is created by SlowFuzz at dierent gener-ations is not monotonically increasing. This is an artifact ofour implementation, using SanitizerCoverage’s 8-bit counters,which provide a coarse-grained, imprecise tracking of the realnumber of times each edge was invoked (Section 4). Thus, al-though a unit might result in execution of fewer instructions,it will only be observed by SlowFuzz’s guidance engine ifthe respective number of total CFG edges falls into a separatebucket (8 possible ranges representing the total number ofCFG edge accesses). Future work can consider applying moreprecise instruction tracking (e.g., using hardware counters orutilities similar to perf) with static analyses passes, to achievemore eective guidance.

Finally, when choosing the SlowFuzz tness function, wealso considered the option of utilizing time-based trackinginstead of performance counters. However, performing time-based measurements in real-world systems is not trivial, es-pecially at instruction-level granularity and when multiplesamples are required in order to minimize measurement er-rors. In the context of fuzzing, multiple runs of the same inputwill slow the fuzzer down signicantly. To demonstrate thispoint, in Figure 10, we also include an experiment in whichthe execution time of an input is used to guide input gener-ation. In particular, we utilized CPU clock time to measurethe execution time of a unit and discarded the unit if it wasnot slower than all previously seen units. We notice that thecorpus degrades due to system noise and does not achieve anyslowdown larger than 23%. 4

0 200000 400000 600000 800000 1000000Generation

0.5

1.0

1.5

2.0

2.5

3.0

Slow

down

Normalized slowdown over best performing input

TimeCoverageEdge Counters

Figure 10: Comparison of the slowdown achieved by

SlowFuzz under dierent guidance mechanisms, when

applied on the OpenBSD quicksort implementation of

Section 5.2, for an input size of 64 bytes, after 1 million

generations (average of 100 runs).

4Contrary to the slowdowns measured during fuzzing using a single run, theslowdowns presented in Figure 10 are generated using the perf utility runningten iterations per input. Non-monotonic increases denote corpus degradationdue to bad input selection.

10

Page 11: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

Result 3: SlowFuzz’s tness function and mutationschemes outperform code-coverage-guided evolutionarysearch by more than 100%.

Eect ofMutation Schemes. To highlight the dierent char-acteristics of each of SlowFuzz’s mutation schemes describedin Section 3, we repeat one of the experiments of Section 5.2,applying SlowFuzz on the OpenBSD quicksort, each time us-ing a dierent mutation strategy. Our experimental setup isidentical with that of Section 5.2: we sort inputs with a size of64 bytes and fuzz for a total of 1 million generations. For eachmode of operation, we average on a total of 100 SlowFuzzsessions. Our results are presented in Figure 11.

0 200000 400000 600000 800000 1000000Generation

1.0

1.5

2.0

2.5

3.0

3.5

Slow

dow

n

Normalized slowdown over best performing input

RandomMutation PriorityOffset PriorityHybrid

Figure 11: Comparison of the best slowdown achieved

by SlowFuzz’s dierent mutation schemes, at each gen-

eration, when applied on theOpenBSD quicksort imple-

mentation of Section 5.2, for an input size of 64 bytes,

after 1 million generations (average of 100 runs).

We notice that, for the above experiment, choosing a mu-tation at random, is the worst performing option among allmutation options supported by SlowFuzz (Section 3.2), how-ever still achieving a slowdown of 2.33x over the best perform-ing input. Indeed, all of SlowFuzz’s scoring-based mutationengines (oset-priority, mutation-priority and hybrid), areexpected to perform at least as good as selecting mutationsat random, given enough generations, as they avoid gettingstuck with unproductive mutations. We also observe that o-set priority is the fastest mode to converge out of the othermutation schemes for this particular experiment, and resultsin an overall slowdown of 3.27x.

For sorting, osets that correspond to areas of the array thatshould not be mutated, are quickly penalized under the osetpriority scheme, thus mutations are mainly performed on thenon-sorted portions of the array. Additionally, we observe thatmutation priority also outperforms the random scheme due tothe fact that certain mutations (e.g., crossover operations) mayhave devastating eects on the sorting of the array. The muta-tion priority scheme picks up such patterns and avoids suchmutations. By contrast, these mutations continue to be usedunder the random scheme. Finally, we observe that the hybridmode eventually outperforms all other strategies, achieving a

3.30x slowdown, however is the last mutation mode to startreaching a plateau. We suspect that this results from the factthe hybrid mode does not quickly penalize particular inputsor mutations as it needs more samples of each mutation oper-ation and oset pair before avoiding any particular oset ormutation operation.Instrumentation overhead. SlowFuzz’s runtime overhead,measured in executions per second, matches the overhead ofnative libFuzzer. The executions per second achieved on dif-ferent payloads are mostly dominated by the runtimes of thenative binary, as well as the respective I/O operations. Despiteour choice to prototype SlowFuzz using libFuzzer, the designand methodology presented in Section 3 can be applied toany evolutionary fuzzer and can also be implemented usingDynamic Binary Instrumentation frameworks, such as Intel’sPIN [39], to allow for more detailed runtime tracking of the ap-plication state. However, such frameworks are known to incurslowdowns of more than 200%, even with minimal instrumen-tation [43]. For instance, for our PHP hashtable experimentsdescribed in Section 5.4, an insertion of 16 strings, resultingin 8 collisions, takes 0.02 seconds. Running the same insertionunder a PIN tool that only counts instructions, requires a totalof ~2 seconds. By contrast, hashtable fuzzing with SlowFuzzachieves up to 4000 execs/sec, unless a signicant slowdownis incurred due to a particular input. 5

6 DISCUSSION

In this paper, we demonstrated that evolutionary search tech-niques commonly used in fuzzing to nd memory safety bugscan be adapted to nd algorithmic complexity vulnerabilities.Similar strategies should be applicable for nding other typesof DOS attacks like battery draining, lling up memory orhard disk, etc. Designing the tness functions and mutationschemes for detecting such bugs will be an interesting fu-ture research problem. Besides evolutionary techniques, usingother mechanisms like reinforcement learning or Monte Carlosearch techniques can also be adapted for nding inputs withworst-case resource usage.

Our current prototype of SlowFuzz is completely dynamic.However, integrating static analysis techniques into SlowFuzzcan further improve its performance. Using static analysis tond potentially promising osets in an input for mutationwill further reduce the search space and therefore will makethe search process more ecient. For example, using tainttracking and loop analysis together with runtime ow prolescan identify potentially promising code locations that cancause signicant slowdowns [41, 52].

The current prototype implementation of SlowFuzz usesthe SanitizerCoverage passes to keep track of the number oftimes a CFG edge is accessed. Such tracking is limited by thetotal number of buckets allowed by SanitizerCoverage. Thisreduces the accuracy of resource usage information as trackedby SlowFuzz. This results from the fact that any edge that isaccessed more than 128 times is assigned to the same bucketregardless of the actual number of accesses. Although, under

5Execution under SlowFuzz does not require repeated loading of the requiredlibraries, but is only dominated by the function being tested, which is onlya fraction of the total execution of the native binary (thus smaller than 0.02seconds).

11

Page 12: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

its current implementation, the actual edge count informationis imprecise, this is not a fundamental design limitation ofSlowFuzz but an artifact of our prototype implementation.Alternative implementations can oer more precise trackingcan via custom callbacks for SanitizerCoverage, by adoptinghardware counters or by utilizing per-unit perf tracking. Onthe other hand, the benet of the current implementationis that it can be incorporated into libFuzzer’s main engineorthogonally, without requiring major changes to libFuzzer’sdependencies.

7 RELATEDWORK

Complexity attacks. Detecting and mitigating algorithmiccomplexity attacks is an active eld of research. Crosby etal. [31] were the rst to present complexity attacks abusingcollisions in hash table implementations. Contrary to Slow-Fuzz’s approach, however, their attack required expert knowl-edge. Since then, several lines of work have explored attacksand defenses targeting dierent types of complexity attacks:Cai et al. [28] leverage complexity vulnerabilities in the Linuxkernel name lookup hash tables to exploit race conditions inthe kernel access(2)/open(2) system calls, whereas Sun etal. [54] explore complexity vulnerabilities in the name lookupalgorithm of the Linux kernel to achieve an exploitable coverttiming channel. Smith et al. [51] exploit the syntax of theSnort IDS to perform a complexity attack resulting in slow-downs during packet inspection. Shenoy et al. [49, 50] presentan algorithmic complexity attack against the popular Aho-Corasick string searching algorithm and propose hardwareand software-based defenses to mitigate the worst-case perfor-mance of their attacks. Moreover, several lines of work focusparticularly on statically detecting complexity vulnerabilitiesrelated to regular expression matching, especially focusing onbacktracking during the matching process [25, 38, 42, 57]. Con-trary to SlowFuzz, all the above lines of work require deepdomain-dependent knowledge and do not expand to dierentcategories of complexity vulnerabilities.

Finally, recent work by Holland et al. [34] combines staticand dynamic analysis to perform analyst-driven exploration ofJava programs to detect complexity vulnerabilities. However,contrary to SlowFuzz, this work requires a human analystto closely guide the exploration process, specifying whichportions of the binary should be analyzed statically and whichdynamically as well as dening the inputs to the binary.Performance bugs. Several prior works target generic per-formance bugs not necessarily related to complexity vulnera-bilities. For instance, Lu et al. study a large set of real-worldperformance bugs to construct a set of rules that they useto discover new performance bugs via hand-built checkersintegrated in the LLVM compiler infrastructure [36]. Alongthe same lines, LDoctor [52] detects loop ineciencies by im-plementing a hybrid static-dynamic program analysis thatleverages dierent loop-specic rules. Both the above linesof work, contrary to SlowFuzz, require expert-level knowl-edge for creating the detection rules, and are orthogonal tothe current work. Another line of work focuses on applicationproling to detect performance bottlenecks. For example, Ra-manathanet al. utilize ow proling for the ecient detectionof memory-related performance bugs in Java programs [41].

Grechanik et al. utilize a genetic-algorithm-driven proler fordetecting performance bottlenecks [48] in Web applications,and cluster execution traces to explore dierent combinationsof the input parameter values. However, contrary to SlowFuzz,their goal is to explore a large space of input combinationsin the context of automatic application proling and not todetect complexity vulnerabilities.WCET.Another related line of work addresses accurate Worst-Case Execution Time (WCET) estimation for a given applica-tion. Apart from static analysis and evolutionary testing ap-proaches [26], traditionally WCET estimation has been achievedusing search based methods measuring end-to-end executiontimes [55]. Moreover, Hybrid Measurement-Based Analyses(HMBA) have been used to measure the execution times ofprogram segments via instrumentation points [27, 45, 46] andexecution proles [26]. Wegener et al. [56] utilize evolutionarytechniques for testing timing constraints in real-time systems,however contrary to SlowFuzz, apply processor-level timingmeasurements for their tness function and only perform ran-dom mutations. Finally, recent techniques combine hardwareeects and loop bounds with genetic algorithms [37]. How-ever, all of the above methods attempt to detect worst-caseexecution times for simple and mostly straight-line programsegments often used in real-time systems. By contrast, Slow-Fuzz detects algorithmic complexity attacks in large complexprograms deployed in general purpose hardware.Evolutionary Fuzzing. Several lines of work deploy evo-lutionary mutation-based fuzzing to target crash-inducingbugs. Notable examples are the AFL [58], libFuzzer [14], hong-fuzz [11], and syzkaller [23] fuzzers, as well as the CERT BasicFuzzing Framework (BFF) [35], which utilize coverage as theirmain guidance mechanism. Moreover, several frameworkscombine coverage-based evolutionary fuzzing with symbolicexecution [29, 32, 33, 53], or with static analysis and dynamictainting [47] to achieve higher code coverage and increasetheir eectiveness in detecting bugs. Finally, NEZHA [44] uti-lizes evolutionary-based, mutation-assisted testing to targetsemantic bugs. Although many of the aforementioned linesof research share many common building blocks with Slow-Fuzz, they do not target complexity vulnerabilities and mainlyutilize random mutations contrary to SlowFuzz’s targetedmutation strategies.

8 CONCLUSION

In this work we designed SlowFuzz, the rst, to the bestof our knowledge, evolutionary-search-based framework tar-geting algorithmic complexity vulnerabilities. We evaluatedSlowFuzz on a variety of real-world applications includingzip utilities, regular expression libraries and hash table im-plementations. We demonstrated that SlowFuzz can success-fully generate inputs that match the theoretical worst-casecomplexity in known algorithms. We also showed that Slow-Fuzz was successful in triggering complexity vulnerabilitiesin all the applications we examined. SlowFuzz’s evolution-ary engine and mutation strategies generated inputs causingmore than 300-times slowdown in the bzip2 decompressionroutine, produced inputs triggering high numbers of colli-sions in production-level hash table implementations, andalso generated regular expressions with exponential matching

12

wcventure
Insert text
在这项工作中,我们设计了SlowFuzz,这是我们所知的第一个基于进化搜索的框架,针对算法复杂性漏洞。 我们在各种实际应用程序上评估了SlowFuzz,包括zip实用程序,正则表达式库和哈希表实现。
wcventure
Insert text
我们证明了SlowFuzz可以成功生成与已知算法中的理论最坏情况复杂度相匹配的输入。 我们还展示了SlowFuzz成功地触发了我们检查的所有应用程序中的复杂性漏洞。
Page 13: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

complexities without any knowledge about the semantics ofregular expressions. We believe our results demonstrate thatcustomized evolutionary search techniques present a promis-ing direction for automated detection of not only algorithmiccomplexity vulnerabilities, but also of other types of resourceexhaustion vulnerabilities, and hope to aspire tighter integra-tion of existing techniques and static analyses with modernmutation-based evolutionary testing.

9 ACKNOWLEDGMENTS

We would like to thank the anonymous reviewers for theirvaluable feedback. This work is sponsored in part by the Of-ce of Naval Research (ONR) grant N00014-17-1-2010, theNational Science Foundation (NSF) grants CNS-13-18415 andCNS-16-17670, and a Google Faculty Fellowship. Any opinions,ndings, conclusions, or recommendations expressed hereinare those of the authors, and do not necessarily reect thoseof the US Government, ONR, NSF, or Google.

REFERENCES

[1] #800564 - PHP5: trivial hash complexity DoS attack - Debian Bug reportlogs. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800564.

[2] attackercan/regexp-security-cheatsheet. https://github.com/attackercan/regexp-security-cheatsheet/tree/master/RegexpSecurityParser/WAF-regexps.

[3] bk2204/php-hash-dos: A PoC hash complexity DoS against PHP. https://github.com/bk2204/php-hash-dos.

[4] bzip2. http://www.bzip.org/1.0.3/html/index.html.[5] Controlling backtracking. https://msdn.microsoft.com/en-us/library/ds

y130b4(v=vs.110).aspx#controlling_backtracking.[6] CVE-2011-5021. http://cve.mitre.org/cgi-bin/cvename.cgi?name=CV

E-2011-5021.[7] CVE-2013-2099. http://cve.mitre.org/cgi-bin/cvename.cgi?name=CV

E-2013-2099.[8] CVE-2015-2526. http://cve.mitre.org/cgi-bin/cvename.cgi?name=CV

E-2015-2526.[9] gnulib/qsort.c at master coreutils/gnulib. https://github.com/coreutils/g

nulib/blob/master/lib/qsort.c.[10] Hash algorithm and collisions - PHP Internals Book. http://www.phpint

ernalsbook.com/hashtables/hash_algorithm.html.[11] honggfuzz. https://github.com/google/honggfuzz.[12] https://opensource.apple.com/source/xnu/xnu-

1456.1.26/bsd/kern/qsort.c. https://opensource.apple.com/source/xnu/xnu-1456.1.26/bsd/kern/qsort.c.

[13] libc/stdlib/qsort.c. https://sourceforge.net/u/lluct/me722-cm/ci/f3ae3e66860629a7ebe223fdda3fdc8ffbdd9c6d/tree/bionic/libc/stdlib/qsort.c.

[14] libFuzzer - a library for coverage-guided fuzz testing - LLVM 3.9 docu-mentation. http://llvm.org/docs/LibFuzzer.html.

[15] NetBSD: qsort.c,v 1.13 2003/08/07. http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/stdlib/qsort.c.

[16] NVD - CVE-2012-2098. https://nvd.nist.gov/vuln/detail/CVE-2012-2098.[17] NVD - CVE-2013-4287. https://nvd.nist.gov/vuln/detail/CVE-2013-4287.[18] PCRE - Perl Compatible Regular Expressions. http://www.pcre.org/.[19] PHP Vulnerability May Halt Millions of Servers - PHP Classes.

https://www.phpclasses.org/blog/post/171-PHP-Vulnerability-May-Halt-Millions-of-Servers.html.

[20] Regular expression denial of service - redos - owasp. https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS.

[21] SantizerCoverage - Clang 4.0 documentation. http://clang.llvm.org/docs/SanitizerCoverage.html.

[22] Stack exchange network status - outage postmortem -Âăjuly 20, 2016. http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016.

[23] syzkaller. https://github.com/google/syzkaller.[24] Why does stack overow use a backtracking regex implementation? -

meta stack overow. https://meta.stackoverow.com/questions/328376/why-does-stack-overflow-use-a-backtracking-regex-implementation.

[25] Berglund, M., Drewes, F., and van der Merwe, B. Analyzing cata-strophic backtracking behavior in practical regular expression matching.arXiv preprint arXiv:1405.5599 (2014).

[26] Bernat, G., Colin, A., and Petters, S. M. WCET analysis of probabilistichard real-time systems. In Real-Time Systems Symposium, 2002. RTSS 2002.

23rd IEEE (2002), IEEE, pp. 279–288.

[27] Betts, A., Merriam, N., and Bernat, G. Hybrid measurement-basedWCET analysis at the source level using object-level traces. In OASIcs-

OpenAccess Series in Informatics (2010), vol. 15, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.

[28] Cai, X., Gui, Y., and Johnson, R. Exploiting Unix le-system races viaalgorithmic complexity attacks. In Security and Privacy, 2009 30th IEEE

Symposium on (2009), IEEE, pp. 27–41.[29] Cha, S. K., Woo, M., and Brumley, D. Program-Adaptive Mutational

Fuzzing. In 2015 IEEE Symposium on Security and Privacy (S&P) (May2015), pp. 725–741.

[30] Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Introductionto algorithms, vol. 6. MIT press Cambridge, 2001.

[31] Crosby, S. A., and Wallach, D. S. Denial of service via algorithmiccomplexity attacks. In Proceedings of the 12th Conference on USENIX

Security Symposium - Volume 12 (Berkeley, CA, USA, 2003), SSYM’03,USENIX Association, pp. 3–3.

[32] Godefroid, P., Levin, M. Y., Molnar, D. A., et al. Automated WhiteboxFuzz Testing. In Proceedings of the 2008 Network and Distributed Systems

Symposium (NDSS) (2008), vol. 8, pp. 151–166.[33] Haller, I., Slowinska, A., Neugschwandtner, M., and Bos, H. Dowsing

for Overows: A Guided Fuzzer to Find Buer Boundary Violations. In22nd USENIX Security Symposium (USENIX Security ’13) (Washington, D.C.,2013), USENIX, pp. 49–64.

[34] Holland, B., Santhanam, G. R., Awadhutkar, P., and Kothari, S.Statically-Informed Dynamic Analysis Tools to Detect Algorithmic Com-plexity Vulnerabilities. In Source Code Analysis and Manipulation (SCAM),

2016 IEEE 16th International Working Conference on (2016), IEEE, pp. 79–84.[35] Householder, A. D., and Foote, J. M. Probability-based parameter selec-

tion for black-box fuzz testing. In CMU/SEI Technical Report - CMU/SEI-

2012-TN-019 (2012).[36] Jin, G., Song, L., Shi, X., Scherpelz, J., and Lu, S. Understanding and

detecting real-world performance bugs. ACM SIGPLAN Notices 47, 6 (2012),77–88.

[37] Khan, U., and Bate, I. WCET Analysis of Modern Processors Using Multi-Criteria Optimisation. In 2009 1st International Symposium on Search Based

Software Engineering (2009).[38] Kirrage, J., Rathnayake, A., and Thielecke, H. Static analysis for

regular expression denial-of-service attacks. In International Conference

on Network and System Security (2013), Springer, pp. 135–148.[39] Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wal-

lace, S., Reddi, V. J., and Hazelwood, K. Pin: building customizedprogram analysis tools with dynamic instrumentation. In Acm sigplan

notices (2005), vol. 40, ACM, pp. 190–200.[40] McIlroy, M. D. A killer adversary for quicksort. Softw., Pract. Exper. 29, 4

(1999), 341–344.[41] Mudduluru, R., and Ramanathan, M. K. Ecient ow proling for

detecting performance bugs. In Proceedings of the 25th International Sym-

posium on Software Testing and Analysis (2016), ACM, pp. 413–424.[42] Namjoshi, K., and Narlikar, G. Robust and fast pattern matching for

intrusion detection. In INFOCOM, 2010 Proceedings IEEE (2010), IEEE,pp. 1–9.

[43] Petsios, T., Kemerlis, V. P., Polychronakis, M., and Keromytis, A. D.Dynaguard: Armoring canary-based protections against brute-force at-tacks. In Proceedings of the 31st Annual Computer Security Applications

Conference (2015), ACM, pp. 351–360.[44] Petsios, T., Tang, A., Stolfo, S., Keromytis, A. D., and Jana, S. NEZHA:

Ecient Domain-Independent Dierential Testing. In Proceedings of the

38th IEEE Symposium on Security & Privacy,(San Jose, CA) (2017).[45] Petters, S. M. Bounding the execution time of real-time tasks on mod-

ern processors. In Real-Time Computing Systems and Applications, 2000.

Proceedings. Seventh International Conference on (2000), IEEE, pp. 498–502.[46] Petters, S. M., and Farber, G. Making worst case execution time analysis

for hard real-time tasks on state of the art processors feasible. In Real-Time

Computing Systems and Applications, 1999. RTCSA’99. Sixth International

Conference on (1999), IEEE, pp. 442–449.[47] Rawat, S., Jain, V., Kumar, A., Cojocar, L., Giuffrida, C., and Bos, H.

VUzzer: Application-aware Evolutionary Fuzzing. In Proceedings of the

Network and Distributed System Security Symposium (NDSS) (2017).[48] Shen, D., Luo, Q., Poshyvanyk, D., and Grechanik, M. Automating Per-

formance Bottleneck Detection Using Search-based Application Proling.In Proceedings of the 2015 International Symposium on Software Testing and

Analysis (2015), ISSTA 2015, ACM, pp. 270–281.[49] Shenoy, G. S., Tubella, J., and González, A. Improving the resilience of

an IDS against performance throttling attacks. In International Conference

on Security and Privacy in Communication Systems (2012), Springer, pp. 167–184.

[50] Shenoy, G. S., Tubella, J., and Gonz’lez, A. Hardware/Software Mecha-nisms for Protecting an IDS Against Algorithmic Complexity Attacks. InParallel and Distributed Processing Symposium Workshops & PhD Forum

(IPDPSW), 2012 IEEE 26th International (2012), IEEE, pp. 1190–1196.

13

wcventure
Insert text
SlowFuzz的进化引擎和变异策略产生的输入导致bzip2解压缩程序减速超过300倍,产生的输入在生产级哈希表实现中触发大量冲突,并且还生成具有指数匹配复杂性的正则表达式,而不知道 正则表达式的语义。
wcventure
Insert text
We believe our results demonstrate that customized evolutionary search techniques present a promising direction for automated detection of not only algorithmic complexity vulnerabilities, but also of other types of resource exhaustion vulnerabilities, and hope to aspire tighter integration of existing techniques and static analyses with modern mutation-based evolutionary testing.
Page 14: SlowFuzz: Automated Domain-Independent Detection of ......SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities Theo˙los Petsios theo˙los@cs.columbia.edu

[51] Smith, R., Estan, C., and Jha, S. Backtracking algorithmic complexityattacks against a NIDS. In Computer Security Applications Conference, 2006.

ACSAC’06. 22nd Annual (2006), IEEE, pp. 89–98.[52] Song, L., and Lu, S. Performance Diagnosis for Inecient Loops. Under

Submission.[53] Stephens, N., Grosen, J., Salls, C., Dutcher, A., Wang, R., Corbetta,

J., Shoshitaishvili, Y., Kruegel, C., and Vigna, G. Driller: AugmentingFuzzing Through Selective Symbolic Execution. In Proceedings of the

Network and Distributed System Security Symposium (NDSS) (2016).[54] Sun, X., Cheng, L., and Zhang, Y. A Covert Timing Channel via Algo-

rithmic Complexity Attacks: Design and Analysis. In Communications

(ICC), 2011 IEEE International Conference on (2011), IEEE, pp. 1–5.[55] Tracey, N., Clark, J., McDermid, J., and Mander, K. A search-based

automated test-data generation framework for safety-critical systems. InSystems engineering for business process change: new directions. Springer,2002, pp. 174–213.

[56] Wegener, J., and Grochtmann, M. Verifying timing constraints of real-time systems by means of evolutionary testing. Real-Time Systems 15, 3(1998), 275–298.

[57] Wüstholz, V., Olivo, O., Heule,M. J., andDillig, I. Static detection of dosvulnerabilities in programs that use regular expressions. In International

Conference on Tools and Algorithms for the Construction and Analysis of

Systems (2017), Springer, pp. 3–20.[58] Zalewski, M. American Fuzzy Lop. http://lcamtuf.coredump.cx/afl/.

A WAF REGEXES

The slowdowns presented in Figure 7 correspond to inputsmatched against the following regular expressions:

Regex 1:

(?i:(j|(&#x?0*((74) |(4A)|(106) |(6A));?))

([\t]|(&((#x?0*(9|(13) |(10)|A|D);?)|

(tab;)|( newline ;))))*(a|(&#x?0*((65)|

(41) |(97) |(61));?))([\t]|(&((#x?0*(9|

(13) |(10)|A|D);?)|(tab;)|( newline ;))

))*(v|(&#x?0*((86) |(56) |(118) |(76));?)

)([\t]|(&((#x?0*(9|(13) |(10)|A|D);?)|

(tab;)|( newline ;))))*(a|(&#x?0*((65)|

(41) |(97) |(61));?))([\t]|(&((#x?0*(9|

(13) |(10)|A|D);?)|(tab;)|( newline ;))))*

(s|(&#x?0*((83) |(53) |(115) |(73));?))(

[\t]|(&((#x?0*(9|(13) |(10)|A|D);?)|

(tab;)|( newline ;))))*(c|(&#x?0*((67)|

(43) |(99) |(63));?))([\t]|(&((#x?0*(9|

(13) |(10)|A|D);?)|(tab;)|( newline ;))))*

(r|(&#x?0*((82) |(52) |(114) |(72));?))

([\t]|(&((#x?0*(9|(13) |(10)|A|D);?)|

(tab;)|( newline ;))))*(i|(&#x?0*((73)|

(49) |(105) |(69));?))([\t]|(&((#x?0*(9|

(13) |(10)|A|D);?)|(tab;)|( newline ;))))*

(p|(&#x?0*((80) |(50) |(112) |(70));?))

([\t]|(&((#x?0*(9|(13) |(10)|A|D);?)|

(tab;)|( newline ;))))*(t|(&#x?0*((84)|

(54) |(116) |(74));?))([\t]|(&((#x?0*(9|

(13) |(10)|A|D);?)|(tab;)|( newline ;))))

*(:|(&((#x?0*((58) |(3A));?)|( colon;)

))).)

Regex 2:

<(a|abbr|acronym|address|applet|area|

audioscope|b|base|basefront|bdo|

bgsound|big|blackface|blink|

blockquote|body|bq|br|button|caption|

center|cite|code|col|colgroup|

comment|dd|del|dfn|dir|div|dl|

dt|em|embed|fieldset|fn|font|

form|frame|frameset|h1|head|hr|

html|i|iframe|ilayer|img|input|ins|

isindex|kdb|keygen|label|layer|

legend|li|limittext|link|listing|

map|marquee|menu|meta|multicol|

nobr|noembed|noframes|noscript|

nosmartquotes|object|ol|optgroup|

option|p|param|plaintext|pre|q|

rt|ruby|s|samp|script|select|

server|shadow|sidebar|small|

spacer|span|strike|strong|style|

sub|sup|table|tbody|td|textarea|

tfoot|th|thead|title|tr|tt|u|ul|

var|wbr|xml|xmp)\\W

Regex 3:

(?i: <.*[:] vmlframe .*?[ /+\t]*?src[

/+\t]*=)

14