symbolic execution - iowa state...
TRANSCRIPT
Agenda
I What is symbolic execution?
I Applications
I History
I Interal Design: The three challengesI Path explosionI Modeling statements and environmentsI Constraint solving
I Implementation and symbolic execution tools
Intuitive Understanding of Symbolic Execution
I ’Execute’ programs with symbols: we track symbolic state ratherthan concrete input
I ’Execute’ many program paths simultaneously: when execution pathdiverges, fork and add constraints on symbolic values
I When ’execute’ one path, we actually simulate many test runs, sincewe are considering all the inputs that can exercise the same path
I Other variantsI Symbolic Analysis: Model library and system calls rather than
’executing’ it [2003:ICSE:Le]I Concolic Testing: Mixture of symbolic and concrete inputs
[2005:FSE:Sen]
Applications of Symbolic Execution
General goal: identifying semantics of programs
Basic applications:
I Detecting infeasible paths
I Generating test inputs
I Finding bugs and vulnerabilities
I Proving two code segments are equivalent (Code Hunt)
Advanced applications:
I Generating program invariants
I Debugging
I Repair programs
Resurgence of Symbolic Execution
The block issues in the past:
I Not scalable: program state has many bits, there are many programpaths
I Not able to go through loops and library calls
I Constraint solver is slow and not capable to handle advancedconstraints
The two key projects that enable the advance:
I DART Godefroid and Sen, PLDI 2005 (introduce dynamicinformation to symbolic execution)
I EXE Cadar, Ganesh, Pawlowski, Dill, and Engler, CCS 2006 (STP:a powerful constraint solver that handles array)
Moving forward:
I More powerful computers and clusters
I Techniques of mixture concrete and symbolic executions
I Powerful constraint solvers
Today: Two Important Tools
KLEE [2008:OSDI:Cadar]
I Open source symbolic executor
I Runs on top of LLVM
I Has found lots of problems in open-source software
SAGE [PLDI:Godefroid:2008]
I Microsoft internal tool
I Symbolic execution to find bugs in file parsers - E.g., JPEG, DOCX,PPT, etc
I Cluster of n machines continually running SAGE
Other Symbolic Executors
I Cloud9: parallel symbolic execution, also supports threads
I Pex, Code Hunt: Microsoft tools, symbolic execution for .NET
I Cute: concolic testing
I jCUTE: symbolic execution for Java
I Java PathFinder: NASA tools, a model checker that also supportssymbolic execution
I SymDroid: symbolic execution on Dalvik Bytecode
I Kleenet: testing interaction protocols for sensor network
Three Challenges
I Path explosion
I Modeling program statements and environment
I Constraint solving
Search Strategies: Naive Approach
DFS (depth first search), BFS (breadth first search)
The two approaches purely are based on the structure of the code
I You cannot enumerate all the paths
I DFS: search can stuck at somewhere in a loop
I BFS: very slow to determine properties for a path if there are manybranches
Search Strategies: Naive Approach
DFS (depth first search), BFS (breadth first search)
The two approaches purely are based on the structure of the code
I You cannot enumerate all the paths
I DFS: search can stuck at somewhere in a loop
I BFS: very slow to determine properties for a path if there are manybranches
Search Strategies: Naive Approach
DFS (depth first search), BFS (breadth first search)
The two approaches purely are based on the structure of the code
I You cannot enumerate all the paths
I DFS: search can stuck at somewhere in a loop
I BFS: very slow to determine properties for a path if there are manybranches
Search Strategies: Naive Approach
DFS (depth first search), BFS (breadth first search)
The two approaches purely are based on the structure of the code
I You cannot enumerate all the paths
I DFS: search can stuck at somewhere in a loop
I BFS: very slow to determine properties for a path if there are manybranches
Search Strategies: Random Search
How to perform a random search?
I Idea 1: pick next path to explore uniformly at random
I Idea 2: randomly restart search if haven’t hit anything interesting ina while
I Idea 3: when have equal priority paths to explore, choose next oneat random
I ...
Drawback: reproducibility, probably good to use psuedo-randomnessbased on seed, and then record which seed is picked
Search Strategies: Coverage Guided Search
Goal: Try to visit statements we haven’t seen before
Approach:
I Select paths likely to hit the new statements
I Favor paths on recently covering new statements
I Score of statement = # times its been seen and how often; Picknext statement to explore that has lowest score
Pros and cons:
I Good: Errors are often in hard-to-reach parts of the program, thisstrategy tries to reach everywhere.
I Bad: Maybe never be able to get to a statement
Search Strategies: Generational Search
I Hybrid of BFS and coverage-guided search
I Generation 0: pick one path at random, run to completion
I Generation 1: take paths from gen 0, negate one branch conditionon a path to yield a new path prefix, find a solution for that pathprefix, and then take the resulting path
I ...
I Generation n: similar, but branching off gen n-1 (also uses acoverage heuristic to pick priority)
Search Strategies: Combined Search
I Run multiple searches at the same time and alternate between them
I Depends on conditions needed to exhibit bug; so will be as good asbest solution, with a constant factor for wasting time with otheralgorithms
I Could potentially use different algorithms to reach different parts ofthe program
Challenge 2: Complex Code and EnvironmentDependencies
I System calls: open(file)
I Library calls: sin(x), glibc
I Pointers and heap: linklist, tree
I Loops and recursive calls: how many times it should iterate andunfold?
I ...
Solutions
I Simulate system calls
I Build simple versions of library calls
I Assign random values after library calls
I Run library an system calls with a concrete value
I Summarize the loops
I ...
An Example
Program was initiated with a symbolic file system with up to N files.Open all N files + one open() failure.
Solutions: Concretization[2005:PLDI:Godefroid],[2005:FSE:Sen]
I Concolic (concrete/symbolic) testing: run on concrete randominputs. In parallel, execute symbolically and solve constraints.Generate inputs to other paths than the concrete one along the way.
I Replace symbolic variables with concrete values that satisfy the pathcondition
I So, could actually do system calls
I And can handle cases when conditions too complex for SMT solver
Challenge 3: Constraint Solving - SAT
SAT: find an assignment to a set of Boolean variables that makes theBoolean formula true
Complexity: NP-Complete
Constraint Solving - SMT [2011:ACM:DeMoura]
SMT (Satisfiability Modulo Theories) = SAT++
I An SMT formula is a Boolean combination of formulas overfirst-order theories
I Example of SMT theories include bit-vectors, arrays, integer and realarithmetic, strings, ...
I The satisfiability problem for these theories is typically hard ingeneral (NP-complete, PSPACE-complete, ...)
I Program semantics are easily expressed over these theories
I Many software engineering problems can be easily reduced to theSAT problem over first-order theories
Constraint Solving - SMT
The State of the Art: Handle linear integer constraints
Challenges:
I Constraints that contain non-linear operands, e.g., sin(), cos()
I Float-point constraints: no theory support yet, convert to bit-vectorcomputation
I String constraints: a = b.replace(’x’, ’y’)
I Quantifies: ∃, ∀I Disjunction
Tool Design KLEE - Path Explosion
I Random, coverage-optimize search
I Compute state weight using:I Minimum distance to an uncovered instructionI Call stack of the stateI Whether the state recently covered new code
I Timeout: one hour per utility when experimenting with coreutils
Tool Design KLEE - Tracking Symbolic States
Trees of symbolic expressions:
I Instruction pointer
I Path condition
I Registers, heap and stack objects
I Expressions are of C language: arithmetic, shift, dereference,assignment
I Checks inserted at dangerous operations: division, dereferencing
Modeling environment:
I 2500 lines of modeling code to customize system calls (e.g. open,read, write, stat, lseek, ftruncate, ioctl)
I How to generate tests after using symbolic env: supply andescription of symbolic env for each test path; a special drivercreates real OS objects from the description
Tool Design KLEE - Constraint Solving
I STP: a decision procedure for Bit-Vectors and Arrays
I Decision procedures are programs which determine the satisfiabilityof logical formulas that can express constraints relevant to softwareand hardware
I STP uses new efficient SAT solvers
I Treat everything as bit vectors: arithmetic, bitwise operations,relational operations.