unleashing mayhem on binary code

43
Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University

Upload: neylan

Post on 22-Mar-2016

71 views

Category:

Documents


1 download

DESCRIPTION

Unleashing Mayhem on Binary Code. Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University. Automatic Exploit Generation Challenge. Automatically F ind B ugs & G enerate Exploits. I = input(); i f (I < 42) vuln (); else safe();. AEG. Exploits. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Unleashing Mayhem on Binary Code

Unleashing Mayhem on Binary Code

Sang Kil ChaThanassis Avgerinos

Alexandre RebertDavid Brumley

Carnegie Mellon University

Page 2: Unleashing Mayhem on Binary Code

2

Automatic Exploit Generation Challenge

Automatically Find Bugs & Generate Exploits

AEG

Program Exploits

I = input();if (I < 42) vuln();else safe();

Page 3: Unleashing Mayhem on Binary Code

3

Automatic Exploit Generation Challenge

Automatically Find Bugs & Generate Exploits

Explore Program

Page 4: Unleashing Mayhem on Binary Code

4

Ghostscript v8.62 Bugint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …

Reading user input from command line

Buffer overflow

CVE-2009-4270

Page 5: Unleashing Mayhem on Binary Code

5

Multiple Pathsint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …

ManyBranches!

Page 6: Unleashing Mayhem on Binary Code

6

Automatic Exploit Generation Challenge

Automatically Find Bugs & Generate Exploits

Transfer Control to Attacker Code

(exec “/bin/sh”)

Page 7: Unleashing Mayhem on Binary Code

7

Generating Exploitsint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …

outp

rint

f

fmt

ret addr

count

args

bufuser

inpu

t

mai

n

esp

Page 8: Unleashing Mayhem on Binary Code

8

Generating Exploitsint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …

Read Return Address from Stack Pointer (esp)

8

outp

rint

f

fmt

ret addr

count

args

bufuser

inpu

t

mai

n

esp

Control Hijack Possible

Page 9: Unleashing Mayhem on Binary Code

9Source

int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) {…Executables (Binary)

01010010101010100101010010101010100101010101010101000100001000101001001001001000000010100010010101010010101001001010101001010101001010000110010101010111011001010101010101010100101010111110100101010101010101001010101010101010101010

Unleashing MayhemAutomatically Find Bugs & Generate Exploits

for Executables

Page 10: Unleashing Mayhem on Binary Code

10

Demo

Page 11: Unleashing Mayhem on Binary Code

11

if x < 100

if x*x = 0xffffffff

x = input()

How Mayhem Works:Symbolic Execution

if x > 42

vuln()

x can be anything

x > 42

(x > 42) ∧ (x*x != 0xffffffff)

(x > 42) ∧ (x*x != 0xffffffff)

∧ (x >= 100)

f t

f t

f t

Page 12: Unleashing Mayhem on Binary Code

12

f t

f t

f t

x = input()

How Mayhem Works:Symbolic Execution

if x > 42

if x*x = 0xffffffff

vuln()

x can be anything

x > 42

(x > 42) ∧ (x*x == 0xffffffff)

if x < 100

Page 13: Unleashing Mayhem on Binary Code

13

f t

f t

f t

x = input()

if x > 42

if x*x = 0xffffffff

vuln()

Path Predicate = Π

x can be anything

x > 42

(x > 42) ∧ (x*x == 0xffffffff)Π =

if x < 100

Page 14: Unleashing Mayhem on Binary Code

14

f t

f t

f t

x = input()

How Mayhem Works:Symbolic Execution

if x > 42

if x*x = 0xffffffff

vuln()

x can be anything

x > 42

(x > 42) ∧ (x*x == 0xffffffff)

ViolatesSafety Policyif x < 100

Page 15: Unleashing Mayhem on Binary Code

15

int outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}

Safety Policy in Mayhem

outp

rint

f

fmt

ret addr

count

args

bufuser

inpu

t

mai

n

esp

Return to user-controlled address

EIP not affected by user input

Instruction Pointer (EIP) level:

Page 16: Unleashing Mayhem on Binary Code

16

Exploit Generation

Π∧

input[0-31] = attack code∧

input[1038-1042] = attack code address

Exploit is an input that satisfies the predicate:

Exploit PredicateCan transfer

control to attack code?

Can position attack code?

Page 17: Unleashing Mayhem on Binary Code

17

Challenges

Symbolic Execution Exploit Generation

Efficient Resource Management

Symbolic IndexChallenge

Hybrid ExecutionIndex-based Memory

Model

Page 18: Unleashing Mayhem on Binary Code

18

Challenge 1: Resource Management inSymbolic Execution

Page 19: Unleashing Mayhem on Binary Code

19

Current Resource Management in Symbolic Execution

Online Symbolic Execution

Offline Symbolic Execution

(a.k.a. Concolic)

Page 20: Unleashing Mayhem on Binary Code

20

Offline ExecutionOne pathat a time Method 1:

Re-run from scratch Inefficient⟹

Re-executedevery time

Page 21: Unleashing Mayhem on Binary Code

21

Online Execution

Method 2:Stop forking

Miss paths⟹

Method 3: Snapshot process

Huge disk ⟹image

Hit Resource Cap

Fork at branches

Page 22: Unleashing Mayhem on Binary Code

22

Mayhem: Hybrid Execution

Our Method:Don’t snapshot state; use path predicate to recreate state

9.4M 500K

Hit Resource Cap

Fork at branches

Ghostscript 8.62

“Checkpoint”

Page 23: Unleashing Mayhem on Binary Code

23

Hybrid Execution

Manage #executorsin memory within resource cap✓

Minimize duplicated work✓

Lightweight checkpoints✓

Page 24: Unleashing Mayhem on Binary Code

24

Challenge 2: Symbolic Indices

Page 25: Unleashing Mayhem on Binary Code

25

Symbolic Indices

x = user_input();y = mem[x];assert (y == 42);

x can be anything

Which memory cell contains 42?

232 cells to check

Memory0 232 -1

Page 26: Unleashing Mayhem on Binary Code

26

One Cause: Overwritten Pointers

42

mem[0x11223344]

mem[input]

arg

ret addr

ptr

buf

user

inpu

t

… assert(*ptr==42); return;

ptr address 11223344ptr = 0x11223344

Page 27: Unleashing Mayhem on Binary Code

27

Another Cause: Table LookupsTable lookups in standard APIs:• Parsing: sscanf, vfprintf, etc.• Character test: isspace, isalpha, etc.• Conversion: toupper, tolower, mbtowc, etc.• …

Page 28: Unleashing Mayhem on Binary Code

28

Method 1: Concretization

Over-constrained• Misses 40% of exploits in our experiments

Π∧ mem[x] = 42 ∧ ’Π

Π ∧ x = 17∧ mem[x] = 42 ∧ ’Π

✓ Solvable✗ Exploits

Page 29: Unleashing Mayhem on Binary Code

29

Method 2: Fully Symbolic

Π ∧ mem[x] = 42 ∧ ’Π

✗ Solvable✓ Exploits

Π ∧ mem[x] = 42 ∧ mem[0] = v0 ∧…∧ mem[232-1] = v232-1

∧ ’Π

Page 30: Unleashing Mayhem on Binary Code

30

Our ObservationPath predicate (Π)constrains rangeof symbolic memoryaccesses

y = mem[x]

f t

x <= 42

x can be anything

f

t

x >= 50

Use symbolic execution state to:Step 1: Bound memory addresses referencedStep 2: Make search tree for memory address values

42 < x < 50Π

Page 31: Unleashing Mayhem on Binary Code

31

Step 1 — Find Boundsmem[ x & 0xff ]

1. Value Set Analysis1 provides initial bounds• Over-approximation

2. Query solver to refine bounds

Lowerbound = 0, Upperbound = 0xff

[1] Balakrishnan et al., Analyzing memory accesses in x86 executables, ICCC 2004

Page 32: Unleashing Mayhem on Binary Code

32

Step 2 — Index Search Tree Construction

y = mem[x]if x = 1 then y = 10

Index

MemoryValue

1012

22

20

if x = 2 then y = 12

if x = 3 then y = 22

if x = 4 then y = 20

ite( x < 3, left, right )

ite( x < 2, left, right )

Page 33: Unleashing Mayhem on Binary Code

33

Fully Symbolic vs.Index-based Memory Modeling

Fully Symbolic Index-based Piecewise Opt.0

5000

10000Time Timeout atphttpd

v0.4b

Page 34: Unleashing Mayhem on Binary Code

34

Index Search Tree Optimization:Piecewise Linear Approximation

y = 2*x + 10

y = - 2*x + 28

Index

MemoryValue

Page 35: Unleashing Mayhem on Binary Code

35

Piecewise Linear Approximation

Fully Symbolic Index-based Piecewise Opt.0

5000

10000Time

2x faster

atphttpd v0.4b

Page 36: Unleashing Mayhem on Binary Code

36

Exploit Generation

Page 37: Unleashing Mayhem on Binary Code

37

a2ps

aeon

aspell

atphttpd

freeradius

ghostscript

glftpd

gnugol

htget

htpasswd

iwconfig

mbse-bbs

nCompress

orzHttpd

psUtils

rsync

sharutils

socat

squirrel mail

tipxd

xgalaga

xtokkaetama

coolplayer

destiny

dizzy

galan

gsplayer

muse

soritong

1 10 100 1000 10000 100000

Linux

(22)

Windows

(7)

Page 38: Unleashing Mayhem on Binary Code

38

a2ps

aeon

aspell

atphttpd

freeradius

ghostscript

glftpd

gnugol

htget

htpasswd

iwconfig

mbse-bbs

nCompress

orzHttpd

psUtils

rsync

sharutils

socat

squirrel mail

tipxd

xgalaga

xtokkaetama

coolplayer

destiny

dizzy

galan

gsplayer

muse

soritong

1 10 100 1000 10000 100000

2 Unknown Bugs:FreeRadius,GnuGol

Page 39: Unleashing Mayhem on Binary Code

39

Limitations• We do not claim to find all exploitable bugs

• Given an exploitable bug, we do not guarantee we will always find an exploit

• Lots of room for improving symbolic execution, generating other types of exploits (e.g., info leaks), etc.

• We do not consider defenses, which may defend against otherwise exploitable bugs– Q [Schwartz et al., USENIX 2011]But Every Report is Actionable

Page 40: Unleashing Mayhem on Binary Code

40

Related Work• APEG [Brumley et al., IEEE S&P 2008]

– Uses patch to locate bug, no shellcode executed

• Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities

[Heelan, MS Thesis, U. of Oxford 2009]– Creates control flow hijack from crashing input

• AEG [Avgerinos et al., NDSS 2011]

– Find and generate exploits from source code

• BitBlaze, KLEE, Sage, S2E, etc.– Symbolic execution frameworks

Page 41: Unleashing Mayhem on Binary Code

41

Conclusion• Mayhem automatically generated 29 exploits

against Windows and Linux programs

• Hybrid Execution– Efficient resource management for symbolic

execution

• Index-based Memory Modeling– Handle symbolic memory in real-world

applications

Page 42: Unleashing Mayhem on Binary Code

42

Thank You• Our shepherd: Cristian Cadar• Anonymous reviewers• Maverick Woo, Spencer Whitman

Page 43: Unleashing Mayhem on Binary Code

43

Q&ASang Kil Cha ([email protected])http://www.ece.cmu.edu/~sangkilc/