binary‐level program analysis: a discussion of...
TRANSCRIPT
Binary‐level program analysis:A discussion of x86‐64
Gang TanCSE 597
Spring 2019Penn State University
2
* These slides follow Sec 3.13 of the book CSAPP “Computer Systems: A Programmer’s Perspective”; Figures and slides are borrowed/adapted from that book
Intel’s 64‐Bit History
• 2001: Intel Attempts Radical Shift from IA32 to IA64– Totally different architecture (Itanium)– Executes IA32 code only as legacy– Performance disappointing
• 2003: AMD Steps in with Evolutionary Solution– x86‐64 (now called “AMD64”)
• Intel Felt Obligated to Focus on IA64– Hard to admit mistake or that AMD is better
• 2004: Intel Announces EM64T extension to IA32– Extended Memory 64‐bit Technology– Almost identical to x86‐64!
• All but low‐end x86 processors support x86‐64– But, lots of code still runs in 32‐bit mode
3
Overview of x86‐64
• Pointers and long integers are 64 bits long– Integer arithmetic operations support 8, 16, 32, and 64 bits
• 16 general‐purpose registers; each 64‐bit long• Calling conventions pass more parameters via registers
– System V AMD64 ABI: passes the first 6 parameters in registers– As a result, some procedures do not need to access the stack at
all.• Conditional operations are implemented using conditional
move instructions when possible– Better performance than using branches
• Floating‐point operations are implemented using the register‐oriented instruction set in SSE version 2– Rather than the stack‐based approach in IA32
4
x86‐64 Data Types
5
Fig 3.34 of CSAPP
16 64‐bit GP Registers
6Fig 3.35 of CSAPP
Instruction Operands
• Similar to IA32– Except that the base and index registers must use the r‐version of registers
• In addition, PC‐relative addressing– “add rax, 0x200ad1[rip]” accesses mem at address rip+0x200ad1
7
Function Calling: Argument Passing
8
• The following slides assume the System V AMD64 ABI• Arguments (up to the first six) are passed to procedures via
registers– This reduces the overhead of storing and retrieving values on
the stack• callq stores a 64‐bit return address on the stack.
Example of Argument Passing
9
long myfunc(long a, long b, long c, long d, long e, long f, long g, long h) {
long xx = a * b * c * d * e * f * g * h; long yy = a + b + c + d + e + f + g + h; long zz = utilfunc(xx, yy, xx % yy); return zz + 20;
}
* Example from https://eli.thegreenplace.net/2011/09/06/stack‐frame‐layout‐on‐x86‐64/
Function Calling: Stack Frame
• A function may not require a stack frame, if– all local variables can be held in registers, and– no array/structure local variables, and– no address‐of operator (&) is used on local variables, and
– It does not call another function that requires argument passing on the stack, and
– It does not need to save some callee‐save regs
10
Function Calling: Red‐Zone Optimization
• Red‐zone optimization for leaf functions (functions that do not call other funs)– 128 bytes below rsp can be used by a leaf function without stack allocation
– Red‐zone will not be asynchronously clobbered by signals or interrupt handlers, and thus can use it for scratch data
11
Function Calling: the Base Pointer Optimization
• Two options for functions that need a stack frame• Option 1: the traditional approach (default for gcc without
optimizations)– Function prologue: save the base pointer; create the new base
pointer– Function body: References to stack location are made relative to
the base pointer– Function epilogue: restore the base pointer
• Option 2: faster (default for gcc with optimizations)– Do not save/restore the base pointer; rbp used as a GP register– References to stack locations are made relative to the stack
pointer– Stack allocation at the beginning; rsp remains at a fixed position
during a call
12
Example
13
long int simple_l (long int *xp, long int y){long int t = *xp + y;*xp = t;return t;
}
C source code
Example
14
simple_l:pushl %ebp ; Save frame pointermovl %esp, %ebp ; New frame pointermovl 8(%ebp), %edx ; Retrieve xpmovl 12(%ebp), %eax ; Retrieve ypaddl (%edx), %eax ; Add *xp to get tmovl %eax, (%edx) ; Store t at xppopl %ebp ; Restore frame pointerret
Optimizedx86‐32 Assembly
Example
15
Optimizedx86‐64 Assembly
simple_l:movq %rsi, %rax ; Copy yaddq (%rdi), %rax ; Add *xp to get tmovq %rax, (%rdi) ; Store t at xpret
Unoptimizedx86‐64 Assembly
simple_l:pushq %rbpmovq %rsp, %rbpmovq %rdi, ‐24(%rbp)movq %rsi, ‐32(%rbp)movq ‐24(%rbp), %raxmovq (%rax), %raxaddq ‐32(%rbp), %raxmovq %rax, ‐8(%rbp)movq ‐24(%rbp), %raxmovq ‐8(%rbp), %rdxmovq %rdx, (%rax)movq ‐8(%rbp), %raxleaveret
Function Calling: Caller/Callee‐Save Registers
• Callee‐saved regs: rbx, rbp, and r12 to r15• Caller‐saved regs: r10 and r11
16
x86‐64 Assembly Code Example
long plus(long x, long y);
void sumstore(long x, long y, long *dest)
{long t = plus(x, y);*dest = t;
}
Optimized x86‐64 Assemblysumstore:
pushq %rbxmovq %rdx, %rbxcall plusmovq %rax, (%rbx)popq %rbxret
C source code