optimizing your dyninst program

28
Optimizing Your Dyninst Program Matthew LeGendre [email protected]

Upload: marah-avery

Post on 04-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Optimizing Your Dyninst Program. Matthew LeGendre [email protected]. Optimizing Dyninst. Dyninst is being used to insert more instrumentation into bigger programs. For example: Instrumenting every memory instruction Working with binaries 200MB in size Performance is a big consideration - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Optimizing Your Dyninst Program

Optimizing Your Dyninst Program

Matthew [email protected]

Page 2: Optimizing Your Dyninst Program

-2- Optimizing Your Dyninst Program

Optimizing Dyninst• Dyninst is being used to insert more

instrumentation into bigger programs. For example:– Instrumenting every memory instruction– Working with binaries 200MB in size

• Performance is a big consideration

• What can we do, and what can you do to help?

Page 3: Optimizing Your Dyninst Program

-3- Optimizing Your Dyninst Program

Performance Problem Areas• Parsing: Analyzing the binary and

reading debug information.

• Instrumenting: Rewriting the binary to insert instrumentation.

• Runtime: Instrumentation slows down a mutatee at runtime.

Page 4: Optimizing Your Dyninst Program

-4- Optimizing Your Dyninst Program

Optimizing Dyninst

• Programmer Optimizations– Telling Dyninst not to output tramp guards.

• Dyninst Optimizations– Reducing the number of registers saved

around instrumentation.

Page 5: Optimizing Your Dyninst Program

-5- Optimizing Your Dyninst Program

Parsing Overview• Control Flow

– Identifies executable regions• Data Flow

– Analyzes code prior to instrumentation• Debug

– Reads debugging information, e.g. line info• Symbol

– Reads from the symbol table

• Lazy Parsing: Not parsed until it is needed

Page 6: Optimizing Your Dyninst Program

-6- Optimizing Your Dyninst Program

Control Flow• Dyninst needs to analyze a binary

before it can instrument it.– Identifies functions and basic blocks

• Granularity– Parses all of a module at once.

• Triggers– Operating on a BPatch_function– Requesting BPatch_instPoint objects– Performing Stackwalks (on x86)

Page 7: Optimizing Your Dyninst Program

-7- Optimizing Your Dyninst Program

Data Flow• Dyninst analyzes a function before

instrumenting it.– Live register analysis– Reaching allocs on IA-64

• Granularity– Analyzes a function at a time.

• Triggers– The first time a function is instrumented

Page 8: Optimizing Your Dyninst Program

-8- Optimizing Your Dyninst Program

Debug Information• Reads debug information from a

binary.

• Granularity– Parses all of a module at once.

• Triggers – Line number information– Type information– Local variable information

Page 9: Optimizing Your Dyninst Program

-9- Optimizing Your Dyninst Program

Symbol Table• Extracts function and data information

from the symbol

• Granularity– Parses all of a module at once.

• Triggers– Not done lazily. At module load.

Page 10: Optimizing Your Dyninst Program

-10- Optimizing Your Dyninst Program

Lazy Parsing OverviewGranularity

Triggered By

Control Flow

Module BPatch_function Queries

Data Flow Function Instrumentation

Debug Module Debug Info Queries

Symbol Module Automatically

Lazy parsing allows you to avoid or defer costs.

Page 11: Optimizing Your Dyninst Program

-11- Optimizing Your Dyninst Program

foo:0x1000: push ebp0x1001: movl esp,ebp0x1002: push $10x1004: call bar0x1005: leave0x1006: ret

foo:0x1000: push ebp0x1001: movl esp,ebp0x1002: push $10x1004: call bar0x1005: leave0x1006: ret

foo:0x2000: jmp entry_instr0x2005: push ebp0x2006: movl esp,ebp0x2007: push $10x2009: call bar0x200F: leave0x2010: ret

foo:0x2000: jmp entry_instr0x2005: push ebp0x2006: movl esp,ebp0x2007: push $10x2009: call bar0x200F: leave0x2010: ret

foo:0x3000: jmp entry_instr0x3005: push ebp0x3006: movl esp,ebp0x3007: push $10x3009: jmp call_instr0x300F: call bar0x3014: leave0x3015: ret

foo:0x3000: jmp entry_instr0x3005: push ebp0x3006: movl esp,ebp0x3007: push $10x3009: jmp call_instr0x300F: call bar0x3014: leave0x3015: ret

foo:0x4000: jmp entry_instr0x4005: push ebp0x4006: movl esp,ebp0x4007: push $10x4009: jmp call_instr0x400F: call bar0x4014: leave0x4015: jmp exit_instr0x401A: ret

foo:0x4000: jmp entry_instr0x4005: push ebp0x4006: movl esp,ebp0x4007: push $10x4009: jmp call_instr0x400F: call bar0x4014: leave0x4015: jmp exit_instr0x401A: ret

Inserting Instrumentation• What happens when we re-instrument a

function?

Page 12: Optimizing Your Dyninst Program

-12- Optimizing Your Dyninst Program

Inserting Instrumentation• Bracket instrumentation requests with:

beginInsertionSet()...

endInsertionSet()

• Batches instrumentation– Allows for transactional instrumentation– Improves efficiency (rewrite)

Page 13: Optimizing Your Dyninst Program

-13- Optimizing Your Dyninst Program

Runtime Overhead• Two factors determine how

instrumentation slows down a program.– What does the instrumentation cost?

• Increment a variable• Call a cache simulator

– How often does instrumentation run? • Every time read/write are called• Every memory instruction

• Additional Dyninst overhead on each instrumentation point.

Page 14: Optimizing Your Dyninst Program

-14- Optimizing Your Dyninst Program

Runtime Overhead - Basetramps

save all GPRsave all FPRt = DYNINSTthreadIndex()if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false}restore all FPRrestore all GPR

save all GPRsave all FPRt = DYNINSTthreadIndex()if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false}restore all FPRrestore all GPR

Save Registers

Restore Registers

Calculate Thread IndexCheck Tramp GuardsRun All Minitramps

A Basetramp

Page 15: Optimizing Your Dyninst Program

-15- Optimizing Your Dyninst Program

Runtime Overhead – Registers

save all GPRsave all FPRt = DYNINSTthreadIndex()if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false}restore all FPRrestore all GPR

save all GPRsave all FPRt = DYNINSTthreadIndex()if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false}restore all FPRrestore all GPR

A Basetramp

•Analyzes minitramps for register usage.

•Analyzes functions for register liveness.

•Only saves what is live and used.

Page 16: Optimizing Your Dyninst Program

-16- Optimizing Your Dyninst Program

Runtime Overhead – Registers

save all GPRsave all FPRt = DYNINSTthreadIndex()if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false}restore all FPRrestore all GPR

save all GPRsave all FPRt = DYNINSTthreadIndex()if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false}restore all FPRrestore all GPR

A Basetramp•Called functions are recursively analyzed to a max call depth of 2.minitramp

foo()

bar()

call

call

Page 17: Optimizing Your Dyninst Program

-17- Optimizing Your Dyninst Program

Runtime Overhead – Registers

save live GPRt = DYNINSTthreadIndex()if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false}restore live GPR

save live GPRt = DYNINSTthreadIndex()if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false}restore live GPR

A Basetramp•Use shallow function call chains under instrumentation, so Dyninst can analyze all reachable code.

•Use BPatch::setSaveFPR() to disable all floating point saves.

Page 18: Optimizing Your Dyninst Program

-18- Optimizing Your Dyninst Program

Runtime Overhead – Tramp Guards

save live GPRt = DYNINSTthreadIndex()if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false}Restore live GPR

save live GPRt = DYNINSTthreadIndex()if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false}Restore live GPR

A Basetramp•Prevents recursive instrumentation.

•Needs to be thread aware.

Page 19: Optimizing Your Dyninst Program

-19- Optimizing Your Dyninst Program

Runtime Overhead – Tramp Guards

A Basetramp•Build instrumentation that doesn’t make function calls (no BPatch_funcCallExpr snippets)

•Use setTrampRecursive() if you’re sure instrumentation won’t recurse.

save live GPRt = DYNINSTthreadIndex()

jump to minitramps

restore live GPR

save live GPRt = DYNINSTthreadIndex()

jump to minitramps

restore live GPR

Page 20: Optimizing Your Dyninst Program

-20- Optimizing Your Dyninst Program

Runtime Overhead – ThreadsA Basetramp

•Returns an index value (0..N) unique to the current thread.

•Used by tramp guards and for thread local storage by instrumentation

•Expensive

save live GPRt = DYNINSTthreadIndex()

jump to minitramps

restore live GPR

save live GPRt = DYNINSTthreadIndex()

jump to minitramps

restore live GPR

Page 21: Optimizing Your Dyninst Program

-21- Optimizing Your Dyninst Program

Runtime Overhead – ThreadsA Basetramp

•Not needed if there are no tramp guards.

•Only used on mutatees linked with a threading library (e.g. libpthread)

save live GPR

jump to minitramps

restore live GPR

save live GPR

jump to minitramps

restore live GPR

Page 22: Optimizing Your Dyninst Program

-22- Optimizing Your Dyninst Program

Runtime Overhead – MinitrampsA Basetramp

•Minitramps contain the actual instrumentation.

•What can we do with minitramps?

save live GPR

jump to minitramps

restore live GPR

save live GPR

jump to minitramps

restore live GPR

Page 23: Optimizing Your Dyninst Program

-23- Optimizing Your Dyninst Program

Runtime Overhead – Minitramps

//Increment var by 1load var -> regreg = reg + 1store reg -> varjmp Minitramp B

//Increment var by 1load var -> regreg = reg + 1store reg -> varjmp Minitramp B

Minitramp A•Created by our code generator, which assumes a RISC like architecture.

•Instrumentation linked by jumps.

//Call foo(arg1)push arg1call foojmp BaseTramp

//Call foo(arg1)push arg1call foojmp BaseTramp

Minitramp B

Page 24: Optimizing Your Dyninst Program

-24- Optimizing Your Dyninst Program

//Increment var by 1load var -> regreg = reg + 1store reg -> varjmp Minitramp B

//Increment var by 1load var -> regreg = reg + 1store reg -> varjmp Minitramp B

//Increment var by 1

inc var

jmp Minitramp B

//Increment var by 1

inc var

jmp Minitramp B

Runtime Overhead – MinitrampsMinitramp A

//Call foo(arg1)push arg1call foojmp BaseTramp

//Call foo(arg1)push arg1call foojmp BaseTramp

Minitramp B

•New code generator recognizes common instrumentation snippets and outputs CISC instructions.

• Works on simple arithmetic, and stores.

Page 25: Optimizing Your Dyninst Program

-25- Optimizing Your Dyninst Program

//Increment var by 1

inc var

jmp Minitramp B

Runtime Overhead – Minitramps

//Increment var by 1load var -> regreg = reg + 1store reg -> varjmp Minitramp B

Minitramp A

Minitramp B

//Increment var by 1

inc var

jmp Minitramp B

//Increment var by 1

inc var

jmp Minitramp B

Mergetramp

//Call foo(arg1)push arg1call foojmp BaseTramp

•New merge tramps combine minitramps together with basetramp.

•Faster execution, slower re-instrumentation.

•Change behavior with BPatch::setMergeTramp

Page 26: Optimizing Your Dyninst Program

-26- Optimizing Your Dyninst Program

Runtime Overhead

• Where does the Dyninst’s runtime overhead go?– 87% Calculating thread indexes– 12% Saving registers– 1% Trampoline Guards

• Dyninst allows inexpensive instrumentation to be inexpensive.

Page 27: Optimizing Your Dyninst Program

-27- Optimizing Your Dyninst Program

Summary

• Make use of lazy parsing

• Use insertion sets when inserting instrumentation.

• Small, easy to understand snippets are easier for Dyninst to optimize.– Try to avoid function calls in

instrumentation.

Page 28: Optimizing Your Dyninst Program

-28- Optimizing Your Dyninst Program

Questions?