static analysis ◦ towards automatic signature generation of vulnerability-based signature dynamic...
Post on 19-Jan-2016
225 Views
Preview:
TRANSCRIPT
Static Analysis◦ Towards Automatic Signature Generation
of Vulnerability-based Signature Dynamic Analysis
◦ Unleashing Mayhem on Binary Code
Automatic exploit detection: attack and defense
Static and Dynamic Analysis
Towards Automatic Signature Generation of Vulnerability-based Signature
Defense: Static Analysis
Definition◦ Vulnerability - A vulnerability is a type of bug
that can be used by an attacker to alter the intended operation of the software in a malicious way.
◦ Exploit - An exploit is an actual input that triggers a software vulnerability, typically with malicious intent and devastating consequences
Background
Zero-day attacks that exploit unknown vulnerabilities represent a serious threatNo patch or signature availableSymantec:20 unknown vulnerabilities exploited
07/2005 – 06/2007Current practice is new vulnerability analysis
and protection generation is mostly manual
Our goal: automate the process of protection generation for unknown vulnerabilities
Motivation
Software Patch: patch the binary of vulnerable application
Input Filter: a network firewall or a module on the I/O path
Data Patch: patch the data input instead of binarySignature: signature-based input filtering
How to protect a Vulnerability Application?
DataInput
Input Filter
Vulnerable application
Dropped
Automatic signature generation
Reason:◦ Manual signature generation is slow and error◦ Fast generation is important – previously unknown
or unpatched vulnerabilities can be exploited orders of magnitude faster than a human can respond
◦ More accurate
Our Goal
There are usually several different polymorphic exploit variants that can trigger a software vulnerability
Exploit variants may differ syntactically but be semantically equivalent
To be effective -- the signature should be constructed based on the property of the vulnerability, instead of an exploit
Challenges
Require manual steps Employ heuristics which may fail in many
settings Techniques rely on specific properties of an
exploit – return addresses Only work for specific vulnerabilities in
specific circumstances
Limitations of previous approaches
At a high level, our main contribution is a new class of signature, that is not specific to details such as whether an exploit successfully hijacks control of the program, but instead whether executing an input will (potentially) result in an unsafe execution state.
Our approach
vulnerability signature◦ whether executing an input potentially results
in an unsafe program state T(P, x)
◦ the execution trace obtained by executing a program P on input x
Vulnerability condition◦ representation (how to express a vulnerability as
a signature)◦ coverage (measured by false positive rate)
Overview
vulnerability signature◦ representation for set of inputs that define a
specified vulnerability condition
trade-offs◦ representation: matching accuracy vs. efficiency◦ signature creation: creation time vs. coverage
Tuple {P,T,x,c} ◦ binary program (P), instruction trace (T), exploit
string (x), vulnerability condition (c)
Vulnerability Signature
(P,c) = (< i1, . . . , ik >,c) T(P,x) is the execution trace of running P
with input x means T satisfies vulnerability condition c
LP,c consists of the set of all inputs x to a program P such that
Formally: An exploit for a vulnerability (P,c) is an input
Vulnerability Signature Notation
P given in box x = g/AAAA T={1,2,3,4,6,7, 8,9,8,10,11,10, 11,10,11,10, 11,10,11} c = heap overflow (on 5th iteration of line 11)
Example
A vulnerability signature is a matching function MATCH which for an input x returns either EXPLOIT or BENIGN for a program P without running the program
A perfect vulnerability signature satisfies
Completeness: Soundness:
Vulnerability Signature Definition
C: Ґ×D×M×K×I ->{BENIGN, EXPLOIT}
Ґ is a memory D is the set of variables defined M is the program’s map from memory to
values K is the continuation stack I is the next instruction to execute
Vulnerability Condition
Turing machine signatures◦ precise (no false positive or negatives)◦ may not terminate (in presence of loops, e.g.)
symbolic constraint signatures◦ approximates looping, aliasing◦ guaranteed to terminate
regular expression signatures◦ approximates elementary constructs (counting)◦ very efficient
Signature Representation Classes
Can provide a precise, even exact, characterization of the vulnerability condition in a particular program
A TM that exactly emulates the program has no error rate
Turing Machine Sig.
says that for 10-char input, the first char is ‘g’ or ‘G’, up to four of the next chars may be spaces and at least 5 chars are non-spaces
Symbolic Constraint Sig.
says ‘g’ or ‘G’ followed by 0 or more spaces and at least 5 non-spaces
E.g: [g|G][ ]*[ˆ ]{5,}
Regular Expression Sig.
TM - inlining vulnerability condition takes poly time
Symb. Constraint - poly-time transformations on TM
Regexp - solve constraint (exp time; PSPACE-complete)
or data-flow on TM (poly time)
Accuracy VS. Efficiency
Algorithm Overview
Input:◦ Vulnerable program P◦ Vul condition c◦ Sample exploit x◦ Instruction trace T
Output:◦ TM sig◦ Symbolic constraint sig◦ RegEx sig
MEP is a straight-line program -- e.g. the path that the exploit took to reach the vulnerability
PEP includes different paths to the vulnerability
a complete PEP coverage signature accepts all inputs in LP,c
complete coverage through a chop of the program includes all paths from the input read (vinit) to the vulnerability point (vfinal)
MEP and PEP
TM -> Symbolic Constraint Statically estimate effects of memory
updates and loops Memory updates: SSA analysis Loops: static unrolling
Evaluation 9000 lines C++ code
◦ CBMC model checker to build/solve symbolic constraints, generate RegEx’s
◦ disassembler based on Kruegel; IR new ATPhttpd
◦ various vulnerabilities; sprintf-style string too long◦ 10 distinct subpaths to RegEx in 0.1216sec
BIND◦ stack overflow vulnerability; TSIG vulnerability◦ 10 distinct graphs in symbolic constraint◦ 30ms for chopping◦ 88% of functions were reachable between entry and
vulnerability
Propose a framework on automatically generate vulnerability signatures◦ Turing Machine◦ Symbolic Constraints◦ Regular Expressions
Preliminary work on the feasibility of a grand challenge problem for decades
Conclusion
Attack: Dynamic Analysis
Unleashing Mayhem on Binary Code
27
Automatic Exploit Generation Challenge
Automatically Find Bugs & Generate Exploits
Explore Program
28
Ghostscript v8.62 Bugint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …
Reading user input from command line
Buffer overflow
CVE-2009-4270
29
Multiple Pathsint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …
ManyBranches!
30
Automatic Exploit Generation Challenge
Automatically Find Bugs & Generate Exploits
Transfer Control to Attacker Code
(exec “/bin/sh”)
31
Generating Exploitsint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …
ou
tpri
ntf
…
fmt
ret addr
count
args
bufuser
inp
ut
main
esp
32
Generating Exploitsint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …
Read Return Address from Stack Pointer (esp)
32
ou
tpri
ntf
…
fmt
ret addr
count
args
bufuser
inp
ut
main
esp
Control Hijack Possible
33Source
int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) {…
Executables (Binary)
01010010101010100101010010101010100101010101010101000100001000101001001001001000000010100010010101010010101001001010101001010101001010000110010101010111011001010101010101010100101010111110100101010101010101001010101010101010101010
Unleashing MayhemAutomatically Find Bugs & Generate Exploits
for Executables
34
f t
f t
f t
x = input()
How Mayhem Works:Symbolic Execution
if x > 42
if x*x = 0xffffffff
vuln()
x can be anything
x > 42
(x > 42) ∧ (x*x == 0xffffffff)
if x < 100
35
f t
f t
f t
x = input()
if x > 42
if x*x = 0xffffffff
vuln()
Path Predicate = Π
x can be anything
x > 42
(x > 42) ∧ (x*x == 0xffffffff)
Π =if x < 100
36
f t
f t
f t
x = input()
How Mayhem Works:Symbolic Execution
if x > 42
if x*x = 0xffffffff
vuln()
x can be anything
x > 42
(x > 42) ∧ (x*x == 0xffffffff)
ViolatesSafety Policyif x < 100
37
int outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}
Safety Policy in Mayhem
ou
tpri
ntf
…
fmt
ret addr
count
args
bufuser
inp
ut
main
esp
Return to user-controlled address
EIP not affected by user input
Instruction Pointer (EIP) level:
38
Exploit Generation
Π∧
input[0-31] = attack code∧
input[1038-1042] = attack code address
Exploit is an input that satisfies the predicate:
Exploit PredicateCan transfer
control to attack code?
Can position attack code?
39
Challenges
Symbolic Execution Exploit Generation
Efficient Resource Management
Symbolic IndexChallenge
Hybrid ExecutionIndex-based
Memory Model
40
Challenge 1: Resource Management inSymbolic Execution
41
Current Resource Management in Symbolic Execution
Online Symbolic Execution
Offline Symbolic Execution
(a.k.a. Concolic)
42
Offline ExecutionOne pathat a time
Method 1:Re-run from scratch⟹ Inefficient
Re-executedevery time
43
Online Execution
Method 2:Stop forking⟹ Miss paths
Method 3: Snapshot process ⟹ Huge disk image
Hit Resource Cap
Fork at branche
s
44
Mayhem: Hybrid ExecutionOur Method:Don’t snapshot state; use path predicate to recreate state
9.4M 500K
Hit Resource Cap
Fork at branche
s
Ghostscript 8.62
“Checkpoint”
45
Hybrid Execution
Manage #executorsin memory within resource
cap✓
Minimize duplicated work✓
Lightweight checkpoints✓
46
Challenge 2: Symbolic Indices
47
Symbolic Indicesx = user_input();y = mem[x];assert (y == 42);
x can be anything
Which memory cell contains 42?
232 cells to check
Memory0 232 -1
48
One Cause: Table Lookups
Table lookups in standard APIs: Parsing: sscanf, vfprintf, etc. Character test: isspace, isalpha, etc. Conversion: toupper, tolower, mbtowc,
etc. …
49
Method 1: Concretization
Over-constrained Misses 40% of exploits in our experiments
Π ∧ mem[x] = 42 ∧ Π’
Π ∧ x = 17∧ mem[x] = 42 ∧ Π’
✓ Solvable✗ Exploits
50
Method 2: Fully SymbolicΠ ∧ mem[x] = 42 ∧ Π’
✗ Solvable✓ Exploits
Π ∧ mem[x] = 42 ∧ mem[0] = v0 ∧…∧ mem[232-1] = v232-1
∧ Π’
51
Our ObservationPath predicate (Π)constrains rangeof symbolic memoryaccesses
y = mem[x]
f t
x <= 42
x can be anything
f
t
x >= 50
Use symbolic execution state to:Step 1: Bound memory addresses referencedStep 2: Make search tree for memory address values
Π 42 < x < 50
52
Step 1 — Find Boundsmem[ x & 0xff ]
1.Value Set Analysis1 provides initial bounds• Over-approximation
2.Query solver to refine bounds
Lowerbound = 0, Upperbound = 0xff
[1] Balakrishnan et al., Analyzing memory accesses in x86 executables, ICCC 2004
53
Step 2 — Index Search Tree Construction
y = mem[x]if x = 1 then y = 10
Index
MemoryValue
1012
22
20
if x = 2 then y = 12if x = 3 then y = 22if x = 4 then y = 20
ite( x < 3, left, right )ite( x <
2, left, right )
54
Exploit Generation
55
a2ps
aeon
aspell
atphttpd
freeradius
ghostscript
glftpd
gnugol
htget
htpasswd
iwconfig
mbse-bbs
nCompress
orzHttpd
psUtils
rsync
sharutils
socat
squirrel mail
tipxd
xgalaga
xtokkaetama
coolplayer
destiny
dizzy
galan
gsplayer
muse
soritong
1 10 100 1000 10000 100000
Linux
(22)
Windows
(7)
56
a2ps
aeon
aspell
atphttpd
freeradius
ghostscript
glftpd
gnugol
htget
htpasswd
iwconfig
mbse-bbs
nCompress
orzHttpd
psUtils
rsync
sharutils
socat
squirrel mail
tipxd
xgalaga
xtokkaetama
coolplayer
destiny
dizzy
galan
gsplayer
muse
soritong
1 10 100 1000 10000 100000
2 Unknown Bugs:FreeRadius,GnuGol
57
Limitations We do not claim to find all exploitable bugs
Given an exploitable bug, we do not guarantee we will always find an exploit
Lots of room for improving symbolic execution, generating other types of exploits (e.g., info leaks), etc.
We do not consider defenses, which may defend against otherwise exploitable bugs◦ Q [Schwartz et al., USENIX 2011]But Every Report is Actionable
58
Related Work APEG [Brumley et al., IEEE S&P 2008]
◦ Uses patch to locate bug, no shellcode executed
Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities
[Heelan, MS Thesis, U. of Oxford 2009]◦ Creates control flow hijack from crashing input
AEG [Avgerinos et al., NDSS 2011]◦ Find and generate exploits from source code
BitBlaze, KLEE, Sage, S2E, etc.◦ Symbolic execution frameworks
59
Conclusion Mayhem automatically generated 29
exploits against Windows and Linux programs
Hybrid Execution◦ Efficient resource management for symbolic
execution
Index-based Memory Modeling◦ Handle symbolic memory in real-world
applications
Backup Slides
Algorithm Overview Pre-process
◦ Disassemble binary◦ Convert to an intermediate representation (IR)
Chop ◦ A chop is a partial program P’ that starts at T0 and
ends at exploit point◦ Call-graph level
Compute the sig◦ Get TM sig◦ TM -> Symbolic constraint◦ Symbolic constraint -> RegEx
Chopping Chopping reduces
the size of program to be analyzed
Performed on call-graph level
No function pointer support yet
Get TM Sig Replace outgoing
JMP with RET BENIGN
Symbolic Constraint -> RegEx Solution 1: Solve constraint system S and
or-ing together all members Solution 2: Data-flow analysis optimization
65
if x < 100
if x*x = 0xffffffff
x = input()
How Mayhem Works:Symbolic Execution
if x > 42
vuln()
x can be anything
x > 42
(x > 42) ∧ (x*x != 0xffffffff)
(x > 42) ∧ (x*x != 0xffffffff)
∧ (x >= 100)
f t
f t
f t
66
One Cause: Overwritten Pointers
42
mem[0x11223344]
mem[input]
…
arg
ret addr
ptr
buf
use
r in
pu
t
… assert(*ptr==42); return;
ptr address 11223344
ptr = 0x11223344
67
Index Search Tree Optimization:Piecewise Linear Approximation
y = 2*x + 10
y = - 2*x + 28
Index
MemoryValue
68
Piecewise Linear Approximation
Fully Symbolic Index-based Piecewise Opt.
0
5000
10000Time
2x faster
atphttpd v0.4b
top related