(part 2) symbolic and concolic testing program …...dart: directed automated random testing,...

36
1 Program Testing and Analysis: Symbolic and Concolic Testing (Part 2) Prof. Dr. Michael Pradel Software Lab, TU Darmstadt

Upload: others

Post on 16-May-2020

23 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

1

Program Testing and Analysis:

Symbolic and Concolic Testing(Part 2)

Prof. Dr. Michael Pradel

Software Lab, TU Darmstadt

Page 2: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

2

Warm-up Quiz

var sum = 0;var array = [11, 22, 33];for (x in array) {sum += x;

}console.log(sum);

What does the following code print?

Something else660012112233

Page 3: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

2

Warm-up Quiz

var sum = 0;var array = [11, 22, 33];for (x in array) {sum += x;

}console.log(sum);

What does the following code print?

Something else660012112233Some JS engines

Page 4: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

2

Warm-up Quiz

var sum = 0;var array = [11, 22, 33];for (x in array) {sum += x;

}console.log(sum);

What does the following code print?

Something else660012112233

Arrays are objects

Some JS engines

For-in iterates overobject property names(not property values)

Page 5: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

2

Warm-up Quiz

var sum = 0;var array = [11, 22, 33];for (x in array) {sum += x;

}console.log(sum);

What does the following code print?

Something else660012112233Some JS engines

For arrays, usetraditional for loop:for (var i=0;

i < array.length;i++) ...

Page 6: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

3

Outline

1. Classical Symbolic Execution

2. Challenges of Symbolic Execution

3. Concolic Testing

4. Large-Scale Application in Practice

Mostly based on these papers:

� DART: directed automated random testing, Godefroid et al.,PLDI’05

� KLEE: Unassisted and Automatic Generation ofHigh-Coverage Tests for Complex Systems Programs, Cadaret al., OSDI’08

� Automated Whitebox Fuzz Testing, Godefroid et al., NDSS’08

Page 7: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

4

Execution Tree

All possible execution paths

� Binary tree

� Nodes: Conditional statements

� Edges: Execution of sequenceon non-conditional statements

� Each path in the tree representsan equivalence class of inputs

t f

t f t f

t f

t f

Page 8: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

5

Path Conditions

Quantifier-free formula over the symbolicinputs that encodes all branch decisionstaken so far

function f(x, y) {var z = x + y;if (z > 0) {...

}}

Page 9: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

5

Path Conditions

Quantifier-free formula over the symbolicinputs that encodes all branch decisionstaken so far

function f(x, y) {var z = x + y;if (z > 0) {...

}}

Path condition:x0 + y0 > 0

Page 10: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

6

Applications of Symbolic Execution

� General goal: Reason about behavior of program

� Basic applications� Detect infeasible paths

� Generate test inputs

� Find bugs and vulnerabilies

� Advanced applications� Generating program invariants

� Prove that two pieces of code are equivalent

� Debugging

� Automated program repair

Page 11: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

7

Outline

1. Classical Symbolic Execution

2. Challenges of Symbolic Execution

3. Concolic Testing

4. Large-Scale Application in Practice

Mostly based on these papers:

� DART: directed automated random testing, Godefroid et al.,PLDI’05

� KLEE: Unassisted and Automatic Generation ofHigh-Coverage Tests for Complex Systems Programs, Cadaret al., OSDI’08

� Automated Whitebox Fuzz Testing, Godefroid et al., NDSS’08

Page 12: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

8

Problems of Symbolic Execution

� Loops and recursion: Infinite execution trees

� Path explosion: Number of paths is exponential inthe number of conditionals

� Environment modeling: Dealing withnative/system/library calls

� Solver limitations: Dealing with complex pathconditions

� Heap modeling: Symbolic representation of datastructures and pointers

Page 13: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

8

Problems of Symbolic Execution

� Loops and recursion: Infinite execution trees

� Path explosion: Number of paths is exponential inthe number of conditionals

� Environment modeling: Dealing withnative/system/library calls

� Solver limitations: Dealing with complex pathconditions

� Heap modeling: Symbolic representation of datastructures and pointers

Page 14: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

13

Page 15: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

10

Dealing with Large Execution Trees

Heuristically select whichbranch to explore next

� Select at random

� Select based on coverage

� Prioritize based on distanceto ”interesting” programlocations

� Interleave symbolic executionwith random testing

t f

t f t f

t f

t f

... ... ...

...

... ...

Page 16: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

11

Problems of Symbolic Execution

� Loops and recursion: Infinite execution trees

� Path explosion: Number of paths is exponential inthe number of conditionals

� Environment modeling: Dealing withnative/system/library calls

� Solver limitations: Dealing with complex pathconditions

� Heap modeling: Symbolic representation of datastructures and pointers

Page 17: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

11

Problems of Symbolic Execution

� Loops and recursion: Infinite execution trees

� Path explosion: Number of paths is exponential inthe number of conditionals

� Environment modeling: Dealing withnative/system/library calls

� Solver limitations: Dealing with complex pathconditions

� Heap modeling: Symbolic representation of datastructures and pointers

Page 18: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

12

Modeling the Environment

� Program behavior may depend on parts ofsystem not analyzed by symbolic execution

� E.g., native APIs, interaction with network, filesystem accesses

var fs = require("fs");var content = fs.readFileSync("/tmp/foo.txt");if (content === "bar") {...

}

Page 19: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

13

Modeling the Environment (2)

Solution implemented by KLEE� If all arguments are concrete, forward to OS

� Otherwise, provide models that can handlesymbolic files� Goal: Explore all possible legal interactions

with the environmentvar fs = {readFileSync: function(file) {// doesn’t read actual file system, but// models its effects for symbolic file names

}}

Page 20: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

14

Problems of Symbolic Execution

� Loops and recursion: Infinite execution trees

� Path explosion: Number of paths is exponential inthe number of conditionals

� Environment modeling: Dealing withnative/system/library calls

� Solver limitations: Dealing with complex pathconditions

� Heap modeling: Symbolic representation of datastructures and pointers

Page 21: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

14

Problems of Symbolic Execution

� Loops and recursion: Infinite execution trees

� Path explosion: Number of paths is exponential inthe number of conditionals

� Environment modeling: Dealing withnative/system/library calls

� Solver limitations: Dealing with complex pathconditions

� Heap modeling: Symbolic representation of datastructures and pointers

Approach to address these problems:Mix symbolic with concrete execution

Page 22: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

15

Outline

1. Classical Symbolic Execution

2. Challenges of Symbolic Execution

3. Concolic Testing

4. Large-Scale Application in Practice

Mostly based on these papers:

� DART: directed automated random testing, Godefroid et al.,PLDI’05

� KLEE: Unassisted and Automatic Generation ofHigh-Coverage Tests for Complex Systems Programs, Cadaret al., OSDI’08

� Automated Whitebox Fuzz Testing, Godefroid et al., NDSS’08

Page 23: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

16

Concolic Testing

Mix concrete and symbolic execution =”concolic”

� Perform concrete and symbolic executionside-by-side

� Gather path constraints while program executes

� After one execution, negate one decision, andre-execute with new input that triggers anotherpath

Page 24: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

14

Page 25: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

15

Page 26: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

16

Page 27: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

17

Page 28: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

19

Algorithm

Repeat until all paths are covered

� Execute program with concrete input i and collectsymbolic constraints at branch points: C

� Negate one constraint to force taking analternative branch b′: Constraints C ′

� Call constraint solver to find solution for C ′: Newconcrete input i′

� Execute with i′ to take branch b′

� Check at runtime that b′ is indeed takenOtherwise: ”divergent execution”

Page 29: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

18

Page 30: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

21

Quiz

After how many executions and how many queries tothe solver does concolic testing find the error?

Initial input: a=0, b=0

function concolicQuiz(a, b) {if (a === 5) {var x = b - 1;if (x > 0) {console.log("Error");

}}

}

Page 31: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

19

Page 32: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

22

Benefits of Concolic Approach

When symbolic reasoning is impossibleor impractical, fall back to concretevalues

� Native/system/API functions

� Operations not handled by solver (e.g., floatingpoint operations)

Page 33: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

23

Outline

1. Classical Symbolic Execution

2. Challenges of Symbolic Execution

3. Concolic Testing

4. Large-Scale Application in Practice

Mostly based on these papers:

� DART: directed automated random testing, Godefroid et al.,PLDI’05

� KLEE: Unassisted and Automatic Generation ofHigh-Coverage Tests for Complex Systems Programs, Cadaret al., OSDI’08

� Automated Whitebox Fuzz Testing, Godefroid et al., NDSS’08

Page 34: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

24

Large-Scale Concolic Testing

� SAGE: Concolic testing tool developed atMicrosoft Research

� Test robustness against unexpected inputs readfrom files, e.g.,� Audio files read by media player� Office documents read by MS Office

� Start with known input files and handle bytes readfrom files as symbolic input

� Use concolic execution to compute variants ofthese files

Page 35: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

25

Large-Scale Concolic Testing (2)

� Applied to hundreds of applications

� Over 400 machine years of computation from2007 to 2012

� Found hundreds of bugs, including many securityvulnerabilties� One third of all the bugs discovered by file fuzzing

during the development of Microsoft’s Windows 7

Details: Bounimova et al., ICSE 2013

Page 36: (Part 2) Symbolic and Concolic Testing Program …...DART: directed automated random testing, Godefroid et al., PLDI’05 KLEE: Unassisted and Automatic Generation of High-Coverage

26

Summary: Symbolic & Concolic Testing

Solver-supported, whitebox testing

� Reason symbolically about (parts of) inputs

� Create new inputs that cover not yet exploredpaths

� More systematic but also more expensive thanrandom and fuzz testing

� Open challenges� Effective exploration of huge search space� Other applications of constraint-based program

analysis, e.g., debugging and automated programrepair