compilation 0368-3133 (semester a, 2013/14)
DESCRIPTION
Compilation 0368-3133 (Semester A, 2013/14). Lecture 8: Activation Records + Register Allocation Noam Rinetzky. Slides credit: Roman Manevich , Mooly Sagiv and Eran Yahav. What is a Compiler?. - PowerPoint PPT PresentationTRANSCRIPT
1
Compilation 0368-3133 (Semester A 201314)
Lecture 8 Activation Records + Register Allocation
Noam Rinetzky
Slides credit Roman Manevich Mooly Sagiv and Eran Yahav
2
What is a Compiler
ldquoA compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language)
The most common reason for wanting to transform source code is to create an executable programrdquo
--Wikipedia
3
Conceptual Structure of a Compiler
Executable code
exe
Sourcetext
txtSemantic
RepresentationBackend
CompilerFrontend
LexicalAnalysi
s
Syntax Analysi
sParsing
Semantic
Analysis
IntermediateRepresentati
on(IR)
CodeGeneration
4
From scanning to parsing((23 + 7) x)
) x ) 7 + 23 ( (RP Id OP RP Num OP Num LP LP
Lexical Analyzer
program text
token stream
ParserGrammar E | Id Id lsquoarsquo | | lsquozrsquo
Op()
Id(b)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
validsyntaxerror
5
Context Analysis
Op()
Id(b)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
E1 int E2 int
E1 + E2 int
Type rules
Semantic Error Valid + Symbol Table
6
Code Generation
Op()
Id(b)
Num(23) Num(7)
Op(+)
Valid Abstract Syntax TreeSymbol Tablehellip
Verification (possible runtime)ErrorsWarnings
Intermediate Representation (IR)
Executable Codeinput output
7
Code Generation IR
bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code
bull Primitive statements control flow procedure calls
Source code
(program)
LexicalAnalysis
Syntax Analysis
Parsing
AST SymbolTableetc
InterRep(IR)
CodeGeneration
Source code
(executable)
8
C++ IR
Pentium
Java bytecode
SparcPyhton
Javaoptimize
Lowering Code Gen
Intermediate representationbull A language that is between the source language and
the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for
different source languagestarget machinesbull Goal 2 machine-independent optimizer
ndash Narrow interface small number of node types (instructions)
9
Three-Address Code IR
bull A popular form of IRbull High-level assembly where instructions
have at most three operands
Chapter 8
10
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
11
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
12
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
2
What is a Compiler
ldquoA compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language)
The most common reason for wanting to transform source code is to create an executable programrdquo
--Wikipedia
3
Conceptual Structure of a Compiler
Executable code
exe
Sourcetext
txtSemantic
RepresentationBackend
CompilerFrontend
LexicalAnalysi
s
Syntax Analysi
sParsing
Semantic
Analysis
IntermediateRepresentati
on(IR)
CodeGeneration
4
From scanning to parsing((23 + 7) x)
) x ) 7 + 23 ( (RP Id OP RP Num OP Num LP LP
Lexical Analyzer
program text
token stream
ParserGrammar E | Id Id lsquoarsquo | | lsquozrsquo
Op()
Id(b)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
validsyntaxerror
5
Context Analysis
Op()
Id(b)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
E1 int E2 int
E1 + E2 int
Type rules
Semantic Error Valid + Symbol Table
6
Code Generation
Op()
Id(b)
Num(23) Num(7)
Op(+)
Valid Abstract Syntax TreeSymbol Tablehellip
Verification (possible runtime)ErrorsWarnings
Intermediate Representation (IR)
Executable Codeinput output
7
Code Generation IR
bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code
bull Primitive statements control flow procedure calls
Source code
(program)
LexicalAnalysis
Syntax Analysis
Parsing
AST SymbolTableetc
InterRep(IR)
CodeGeneration
Source code
(executable)
8
C++ IR
Pentium
Java bytecode
SparcPyhton
Javaoptimize
Lowering Code Gen
Intermediate representationbull A language that is between the source language and
the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for
different source languagestarget machinesbull Goal 2 machine-independent optimizer
ndash Narrow interface small number of node types (instructions)
9
Three-Address Code IR
bull A popular form of IRbull High-level assembly where instructions
have at most three operands
Chapter 8
10
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
11
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
12
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
3
Conceptual Structure of a Compiler
Executable code
exe
Sourcetext
txtSemantic
RepresentationBackend
CompilerFrontend
LexicalAnalysi
s
Syntax Analysi
sParsing
Semantic
Analysis
IntermediateRepresentati
on(IR)
CodeGeneration
4
From scanning to parsing((23 + 7) x)
) x ) 7 + 23 ( (RP Id OP RP Num OP Num LP LP
Lexical Analyzer
program text
token stream
ParserGrammar E | Id Id lsquoarsquo | | lsquozrsquo
Op()
Id(b)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
validsyntaxerror
5
Context Analysis
Op()
Id(b)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
E1 int E2 int
E1 + E2 int
Type rules
Semantic Error Valid + Symbol Table
6
Code Generation
Op()
Id(b)
Num(23) Num(7)
Op(+)
Valid Abstract Syntax TreeSymbol Tablehellip
Verification (possible runtime)ErrorsWarnings
Intermediate Representation (IR)
Executable Codeinput output
7
Code Generation IR
bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code
bull Primitive statements control flow procedure calls
Source code
(program)
LexicalAnalysis
Syntax Analysis
Parsing
AST SymbolTableetc
InterRep(IR)
CodeGeneration
Source code
(executable)
8
C++ IR
Pentium
Java bytecode
SparcPyhton
Javaoptimize
Lowering Code Gen
Intermediate representationbull A language that is between the source language and
the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for
different source languagestarget machinesbull Goal 2 machine-independent optimizer
ndash Narrow interface small number of node types (instructions)
9
Three-Address Code IR
bull A popular form of IRbull High-level assembly where instructions
have at most three operands
Chapter 8
10
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
11
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
12
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
4
From scanning to parsing((23 + 7) x)
) x ) 7 + 23 ( (RP Id OP RP Num OP Num LP LP
Lexical Analyzer
program text
token stream
ParserGrammar E | Id Id lsquoarsquo | | lsquozrsquo
Op()
Id(b)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
validsyntaxerror
5
Context Analysis
Op()
Id(b)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
E1 int E2 int
E1 + E2 int
Type rules
Semantic Error Valid + Symbol Table
6
Code Generation
Op()
Id(b)
Num(23) Num(7)
Op(+)
Valid Abstract Syntax TreeSymbol Tablehellip
Verification (possible runtime)ErrorsWarnings
Intermediate Representation (IR)
Executable Codeinput output
7
Code Generation IR
bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code
bull Primitive statements control flow procedure calls
Source code
(program)
LexicalAnalysis
Syntax Analysis
Parsing
AST SymbolTableetc
InterRep(IR)
CodeGeneration
Source code
(executable)
8
C++ IR
Pentium
Java bytecode
SparcPyhton
Javaoptimize
Lowering Code Gen
Intermediate representationbull A language that is between the source language and
the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for
different source languagestarget machinesbull Goal 2 machine-independent optimizer
ndash Narrow interface small number of node types (instructions)
9
Three-Address Code IR
bull A popular form of IRbull High-level assembly where instructions
have at most three operands
Chapter 8
10
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
11
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
12
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
5
Context Analysis
Op()
Id(b)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
E1 int E2 int
E1 + E2 int
Type rules
Semantic Error Valid + Symbol Table
6
Code Generation
Op()
Id(b)
Num(23) Num(7)
Op(+)
Valid Abstract Syntax TreeSymbol Tablehellip
Verification (possible runtime)ErrorsWarnings
Intermediate Representation (IR)
Executable Codeinput output
7
Code Generation IR
bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code
bull Primitive statements control flow procedure calls
Source code
(program)
LexicalAnalysis
Syntax Analysis
Parsing
AST SymbolTableetc
InterRep(IR)
CodeGeneration
Source code
(executable)
8
C++ IR
Pentium
Java bytecode
SparcPyhton
Javaoptimize
Lowering Code Gen
Intermediate representationbull A language that is between the source language and
the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for
different source languagestarget machinesbull Goal 2 machine-independent optimizer
ndash Narrow interface small number of node types (instructions)
9
Three-Address Code IR
bull A popular form of IRbull High-level assembly where instructions
have at most three operands
Chapter 8
10
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
11
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
12
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
6
Code Generation
Op()
Id(b)
Num(23) Num(7)
Op(+)
Valid Abstract Syntax TreeSymbol Tablehellip
Verification (possible runtime)ErrorsWarnings
Intermediate Representation (IR)
Executable Codeinput output
7
Code Generation IR
bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code
bull Primitive statements control flow procedure calls
Source code
(program)
LexicalAnalysis
Syntax Analysis
Parsing
AST SymbolTableetc
InterRep(IR)
CodeGeneration
Source code
(executable)
8
C++ IR
Pentium
Java bytecode
SparcPyhton
Javaoptimize
Lowering Code Gen
Intermediate representationbull A language that is between the source language and
the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for
different source languagestarget machinesbull Goal 2 machine-independent optimizer
ndash Narrow interface small number of node types (instructions)
9
Three-Address Code IR
bull A popular form of IRbull High-level assembly where instructions
have at most three operands
Chapter 8
10
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
11
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
12
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
7
Code Generation IR
bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code
bull Primitive statements control flow procedure calls
Source code
(program)
LexicalAnalysis
Syntax Analysis
Parsing
AST SymbolTableetc
InterRep(IR)
CodeGeneration
Source code
(executable)
8
C++ IR
Pentium
Java bytecode
SparcPyhton
Javaoptimize
Lowering Code Gen
Intermediate representationbull A language that is between the source language and
the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for
different source languagestarget machinesbull Goal 2 machine-independent optimizer
ndash Narrow interface small number of node types (instructions)
9
Three-Address Code IR
bull A popular form of IRbull High-level assembly where instructions
have at most three operands
Chapter 8
10
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
11
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
12
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
8
C++ IR
Pentium
Java bytecode
SparcPyhton
Javaoptimize
Lowering Code Gen
Intermediate representationbull A language that is between the source language and
the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for
different source languagestarget machinesbull Goal 2 machine-independent optimizer
ndash Narrow interface small number of node types (instructions)
9
Three-Address Code IR
bull A popular form of IRbull High-level assembly where instructions
have at most three operands
Chapter 8
10
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
11
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
12
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
9
Three-Address Code IR
bull A popular form of IRbull High-level assembly where instructions
have at most three operands
Chapter 8
10
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
11
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
12
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
10
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
11
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
12
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
11
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
12
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
12
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
13
TAC generation for expressions
bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip
bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store
values of intermediate expressions
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
14
cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
15
Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
16
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
17
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
18
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
19
Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =
Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc
bull Observation temporaries in cgen(e1) can be reused in cgen(e2)
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
20
Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for
leaf expressionsbull Observation ndash temporaries used exactly once
ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =
Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t
bull Temporaries cgen(e1) can be reused in cgen(e2)
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
21
Sethi-Ullman translation
bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries
bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live
bull Live = contain a value that is needed later on
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
22
Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight
ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight
bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1
bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before
attempting to apply this optimization(pre-pass on the tree)
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
23
Weighted reg alloc example
b
5 c
array access
+
a w=0
w=0 w=0
w=1 w=0
w=1
w=1
Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node
_t0 = cgen( a+b[5c] )
base index
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
24
Weighted reg alloc example
b
5 c
array access
+
abase index
w=0
w=0 w=0
w=1 w=0
w=1
w=1
_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation
_t0
_t0
_t0Heavier sub-tree
Heavier sub-tree
_t0 = _t1 _t0
_t0 = _t1[_t0]
_t0 = _t1 + _t0
_t0_t1
_t1
_t1_t0 = c_t1 = 5
_t1 = b
_t1 = a
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
25
TAC Generation forMemory Access Instructions
bull Copy instruction a = bbull Loadstore instructions
a = b a = bbull Address of instruction a=ampbbull Array accesses
a = b[i] a[i] = bbull Field accesses
a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)
ndash Sometimes left out (eg malloc is a procedure in C)
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
26
Array operations
t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]
t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer
x = y[i]
x[i] = y
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
27
TAC Generation forControl Flow Statements
bull Label introduction_label_name
Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L
Goto Lbull Conditional jump test condition variable t
if 0 jump to label LIfZ t Goto L
bull Similarly test condition variable tif 1 jump to label L
IfNZ t Goto L
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
28
Control-flow example ndash conditionsint xint yint z
if (x lt y) z = xelse z = yz = z z
_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
29
cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)
Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
30
Control-flow example ndash loopsint xint y
while (x lt y) x = x 2
y = x
_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1
y = x
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
31
cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label
Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
32
Optimization points
sourcecode
Frontend IR Code
generatortargetcode
Userprofile program
change algorithm
Compilerintraprocedural IRInterprocedural IRIR optimizations
Compilerregister allocation
instruction selectionpeephole transformations
today
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
33
Interprocedural IR
bull Compile time generation of code for procedure invocations
bull Activation Records (aka Stack Frames)
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
34
Supporting Procedures
bull New computing environment ndash at least temporary memory for local variables
bull Passing information into the new environmentndash Parameters
bull Transfer of control tofrom procedurebull Handling return values
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
35
Abstract Register Machine(High Level View)
Code
Data
Highaddresses
Lowaddresses
CPU
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
36
Heap
Abstract Register Machine(High Level View)
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
37
Abstract Activation Record Stack
Stack frame for procedure
Prock+1(a1hellipaN)
Prock
Prock+2
hellip
hellip
Prock+1
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
38
Abstract Stack Frame
Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Prock
Prock+2
hellip
hellip
Stack frame for procedure
Prock+1(a1hellipaN)
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
39
Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to
stack and jumps to the function labelA statement x=f(a1hellipan) looks like
Push a1 hellip Push anCall fPop x copy returned value
bull Returning a value is done by pushing it to the stack (return x)
Push xbull Return control to caller (and roll up stack)
Return
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
40
Heap
Abstract Register Machine
Global Variables
Stack
Lowaddresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
Code
Highaddresses
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
41
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
42
Intro Functions Exampleint SimpleFn(int z)
int x yx = x y zreturn x
void main() int ww = SimpleFunction(137)
_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn
main_t0 = 137Push _t0Call _SimpleFnPop w
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
43
What Can We Do with Procedures
bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as
parameters
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
44
Design Decisions
bull Scoping rulesndash Static scoping vs dynamic scoping
bull Callercallee conventionsndash Parametersndash Who saves register values
bull Allocating space for local variables
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
45
Static (lexical) Scopingmain ( )
int a = 0 int b = 0
int b = 1
int a = 2 printf (ldquod dnrdquo a b)
int b = 3 printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
printf (ldquod dnrdquo a b)
B0B1
B3B3
B2
Declaration Scopes
a=0 B0B1B3
b=0 B0
b=1 B1B2
a=2 B2
b=3 B3
a name refers to its (closest)
enclosing scope
known at compile time
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
46
Dynamic Scoping
bull Each identifier is associated with a global stack of bindings
bull When entering scope where identifier is declaredndash push declaration on identifier stack
bull When exiting scope where identifier is declaredndash pop identifier stack
bull Evaluating the identifier in any context binds to the current top of stack
bull Determined at runtime
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
47
Example
bull What value is returned from mainbull Static scopingbull Dynamic scoping
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
48
Why do we care
bull We need to generate code to access variables
bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code
generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
49
Variable addresses for static scoping first attempt
int x = 42
int f() return x int g() int x = 1 return f() int main() return g()
identifier address
x (global) 0x42
x (inside g) 0x73
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
50
Variable addresses for static scoping first attempt
int a [11]
void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)
main() quicksort (1 9)
what is the address of the variable ldquoirdquo in
the procedure quicksort
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
51
Compile-Time Information on Variablesbull Namebull Typebull Scope
ndash when is it recognizedbull Duration
ndash Until when does its value existbull Size
ndash How many bytes are required at runtime bull Address
ndash Fixedndash Relativendash Dynamic
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
52
Activation Record (Stack Frames)bull separate space for each procedure invocation
bull managed at runtimendash code for managing it generated by the compiler
bull desired properties ndash efficient allocation and deallocation
bull procedures are called frequentlyndash variable size
bull different procedures may require different memory sizes
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
53
Semi-Abstract Register Machine
Global Variables
Stack
Heap
High addresses
Low addresses
CPU Main Memory
Register 00
Register 01
Register xx
hellip
Register PC
Cont
rol
regi
ster
s (d
ata)
regi
ster
sGe
nera
l pur
pose
hellip
ebp
esphellipre
gist
ers
Stac
k
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
54
A Logical Stack Frame (Simplified)Param NParam N-1
hellipParam 1_t0
hellip_tkx
hellipy
Parameters(actual
arguments)
Locals and temporaries
Stack frame for function f(a1hellipaN)
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
55
Runtime Stack
bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of
stackbull How do we handle recursion
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
56
Activation Record (frame)parameter k
parameter 1
lexical pointer
return information
dynamic link
registers amp misc
local variablestemporaries
next frame would be here
hellip
administrativepart
high addresses
lowaddresses
framepointer
stackpointer
incoming parameters
stack grows down
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
57
Runtime Stackbull SP ndash stack pointer
ndash top of current framebull FP ndash frame pointer
ndash base of current framendash Sometimes called BP
(base pointer)
Current frame
hellip hellip
Previous frame
SP
FPstack grows down
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
58
Code Blocks
bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1
adminstrativex1
y1
x2
x3
hellip
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
59
L-Values of Local Variables
bull The offset in the stack is known at compile time
bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3
Store R3 offset(x)(FP)
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
60
Pentium Runtime Stack
Pentium stack registers
Pentium stack and callret instructions
Register Usage
ESP Stack pointer
EBP Base pointer
Instruction Usage
push pushahellip push on runtime stack
poppopahellip Base pointer
call transfer control to called routine
return transfer control back to caller
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
61
Accessing Stack Variables
bull Use offset from FP (ebp)bull Remember ndash
stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples
ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local
hellip hellip
SP
FPReturn address
Return address
Param nhellip
param1
Local 1hellip
Local n
Previous fp
Param nhellip
param1FP+8
FP-4
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
62
Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp
ESP
EBPReturn address
Return address
old ebx
Previous fp
nEBP+8
EBP-4 old ebp
old ebx
(stack in intermediate point)
(disclaimer real compiler can do better than that)
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
63
Call Sequences
bull The processor does not save the content of registers on procedure calls
bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some
registers
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
64
Call Sequences
call
caller
callee
return
caller
Caller push code
Callee push code
(prologue)
Callee pop code
(epilogue)
Caller pop code
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop parametersPop caller-save registers
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
65
Call Sequences ndash Foo(4221)
call
caller
callee
return
caller
push ecx
push $21
push $42
call _foo
push ebp
mov esp ebp
sub 8 esp
push ebx
pop ebx
mov ebp esp
pop ebp
ret
add $8 esp
pop ecx
Push caller-save registersPush actual parameters (in reverse order)
push return addressJump to call address
Push current base-pointerbp = sp
Push local variablesPush callee-save registers
Pop callee-save registersPop callee activation record
Pop old base-pointer
pop return addressJump to address
Pop caller-save registersPop parameters
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
66
ldquoTo Callee-save or to Caller-saverdquo
bull Callee-saved registers need only be saved when callee modifies their value
bull some heuristics and conventions are followed
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
67
Caller-Save and Callee-Save Registersbull Callee-Save Registers
ndash Saved by the callee before modificationndash Values are automatically preserved across calls
bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls
bull Usually the architecture defines caller-save and callee-save registers
bull Separate compilationbull Interoperability between code produced by different
compilerslanguages bull But compiler writers decide when to use calllercallee
registers
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
68
Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls
global _foo
Add_Constant -K SP allocate space for foo
Store_Local R5 -14(FP) save R5
Load_Reg R5 R0 Add_Constant R5 1
JSR f1 JSR g1
Add_Constant R5 2 Load_Reg R5 R0
Load_Local -14(FP) R5 restore R5
Add_Constant K SP RTS deallocate
int foo(int a) int b=a+1 f1()g1(b)
return(b+2)
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
69
Caller-Save Registersbull Saved by the caller before calls when
neededbull Values are not automatically preserved
across calls
global _bar
Add_Constant -K SP allocate space for bar
Add_Constant R0 1
JSR f2
Load_Constant 2 R0 JSR g2
Load_Constant 8 R0 JSR g2
Add_Constant K SP deallocate space for bar
RTS
void bar (int y) int x=y+1f2(x)
g2(2) g2(8)
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
70
Parameter Passingbull 1960s
ndash In memory bull No recursion is allowed
bull 1970sndash In stack
bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved
bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
71
Windows Exploit(s) Buffer Overflow
void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])
aout abracadabraSegmentation fault
Stack grows this way
Memory addresses
Previous frameReturn
addressSaved FPchar xbuf[2]
hellip
abracadabr
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
72
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
73
Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]
strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
74
Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)
auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)
auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)
(source ldquohacking ndash the art of exploitation 2nd Edrdquo)
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
75
Buffer overflow
0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
76
Nested Procedures
bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is
defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
77
program p() int x procedure a() int y procedure b() hellip c() hellip
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
Example Nested Procedures
Possible call sequencep a a c b c d
what are the addresses of variables ldquoxrdquo ldquoyrdquo and
ldquozrdquo in procedure d
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
78
Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)
variables from ldquoardquo which instance of ldquoardquo is it
bull how do you find the right activation record at runtime a
b
P
c c
d
a
Possible call sequencep a a c b c d
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
79
Nested Proceduresbull goal find the closest routine in
the stack from a given nesting level
bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of
the same nesting level it uses its own variables
ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j
bull If a procedure is last at level j on the stack then it must be ancestor of the current routine
Possible call sequencep a a c b c d
a
b
P
c c
d
a
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
80
Nested Proceduresbull problem a routine may need to access variables of
another routine that contains it staticallybull solution lexical pointer (aka access link) in the
activation recordbull lexical pointer points to the last activation record of the
nesting level above itndash in our example lexical pointer of d points to activation records
of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
81
Lexical Pointers
a
a
c
b
c
d
y
y
z
z
Possible call sequencep a a c b c d
a
b
P
c c
d
a
program p() int x procedure a() int y procedure b() c()
procedure c() int z procedure d()
y = x + z
hellip b() hellip d() hellip hellip a() hellip c() hellip a()
82
What is a Compiler
82
What is a Compiler
83
Procedure Q Calls R
84
Activation Records Remarks
85
Non-Local goto in C syntax
86
Non-local gotos in C
bull setjmp remembers the current location and the stack frame
bull longjmp jumps to the current location (popping many activation records)
87
Non-Local Transfer of Control in C
88
Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy
ndash Low overheadndash Hardware support may be available
bull LIFO policybull Not a pure stack
ndash Non local referencesndash Updated using arithmetic
89
The Frame Pointerbull The caller
ndash the calling routinebull The callee
ndash the called routinebull caller responsibilities
ndash Calculate arguments and save in the stackndash Store lexical pointer
bull call instruction M[--SP] = RA PC = callee
bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size
bull Why use both SP and FP
90
Variable Length Frame Sizebull C allows allocating objects of unbounded
size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))
bull Some versions of Pascal allows conformant array value parameters
91
Limitations
bull The compiler may be forced to store a value on a stack instead of registers
bull The stack may not suffice to handle some language features
92
Frame-Resident Variablesbull A variable x cannot be stored in register when
ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame
(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables
bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure
93
The Frames in Different Architectures
Pentium MIPS Sparc
InFrame(8) InFrame(0) InFrame(68)
InFrame(12) InReg(X157) InReg(X157)
InFrame(16) InReg(X158) InReg(X158)
M[sp+0]fpfp spsp sp-K
sp sp-KM[sp+K+0] r2
X157 r4X158 r5
save sp -K spM[fp+68]i0
X157i1
X158i2
g(x y z) where x escapes
x
y
z
View
Change
94
Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its
duration exceeds the duration of Pbull Example 1 Static variables in C
(own variables in Algol)void p(int x) static int y = 6 y += x
bull Example 2 Features of the C languageint f() int x return ampx
bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
84
Activation Records Remarks
85
Non-Local goto in C syntax
86
Non-local gotos in C
bull setjmp remembers the current location and the stack frame
bull longjmp jumps to the current location (popping many activation records)
87
Non-Local Transfer of Control in C
88
Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy
ndash Low overheadndash Hardware support may be available
bull LIFO policybull Not a pure stack
ndash Non local referencesndash Updated using arithmetic
89
The Frame Pointerbull The caller
ndash the calling routinebull The callee
ndash the called routinebull caller responsibilities
ndash Calculate arguments and save in the stackndash Store lexical pointer
bull call instruction M[--SP] = RA PC = callee
bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size
bull Why use both SP and FP
90
Variable Length Frame Sizebull C allows allocating objects of unbounded
size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))
bull Some versions of Pascal allows conformant array value parameters
91
Limitations
bull The compiler may be forced to store a value on a stack instead of registers
bull The stack may not suffice to handle some language features
92
Frame-Resident Variablesbull A variable x cannot be stored in register when
ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame
(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables
bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure
93
The Frames in Different Architectures
Pentium MIPS Sparc
InFrame(8) InFrame(0) InFrame(68)
InFrame(12) InReg(X157) InReg(X157)
InFrame(16) InReg(X158) InReg(X158)
M[sp+0]fpfp spsp sp-K
sp sp-KM[sp+K+0] r2
X157 r4X158 r5
save sp -K spM[fp+68]i0
X157i1
X158i2
g(x y z) where x escapes
x
y
z
View
Change
94
Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its
duration exceeds the duration of Pbull Example 1 Static variables in C
(own variables in Algol)void p(int x) static int y = 6 y += x
bull Example 2 Features of the C languageint f() int x return ampx
bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
85
Non-Local goto in C syntax
86
Non-local gotos in C
bull setjmp remembers the current location and the stack frame
bull longjmp jumps to the current location (popping many activation records)
87
Non-Local Transfer of Control in C
88
Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy
ndash Low overheadndash Hardware support may be available
bull LIFO policybull Not a pure stack
ndash Non local referencesndash Updated using arithmetic
89
The Frame Pointerbull The caller
ndash the calling routinebull The callee
ndash the called routinebull caller responsibilities
ndash Calculate arguments and save in the stackndash Store lexical pointer
bull call instruction M[--SP] = RA PC = callee
bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size
bull Why use both SP and FP
90
Variable Length Frame Sizebull C allows allocating objects of unbounded
size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))
bull Some versions of Pascal allows conformant array value parameters
91
Limitations
bull The compiler may be forced to store a value on a stack instead of registers
bull The stack may not suffice to handle some language features
92
Frame-Resident Variablesbull A variable x cannot be stored in register when
ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame
(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables
bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure
93
The Frames in Different Architectures
Pentium MIPS Sparc
InFrame(8) InFrame(0) InFrame(68)
InFrame(12) InReg(X157) InReg(X157)
InFrame(16) InReg(X158) InReg(X158)
M[sp+0]fpfp spsp sp-K
sp sp-KM[sp+K+0] r2
X157 r4X158 r5
save sp -K spM[fp+68]i0
X157i1
X158i2
g(x y z) where x escapes
x
y
z
View
Change
94
Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its
duration exceeds the duration of Pbull Example 1 Static variables in C
(own variables in Algol)void p(int x) static int y = 6 y += x
bull Example 2 Features of the C languageint f() int x return ampx
bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
86
Non-local gotos in C
bull setjmp remembers the current location and the stack frame
bull longjmp jumps to the current location (popping many activation records)
87
Non-Local Transfer of Control in C
88
Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy
ndash Low overheadndash Hardware support may be available
bull LIFO policybull Not a pure stack
ndash Non local referencesndash Updated using arithmetic
89
The Frame Pointerbull The caller
ndash the calling routinebull The callee
ndash the called routinebull caller responsibilities
ndash Calculate arguments and save in the stackndash Store lexical pointer
bull call instruction M[--SP] = RA PC = callee
bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size
bull Why use both SP and FP
90
Variable Length Frame Sizebull C allows allocating objects of unbounded
size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))
bull Some versions of Pascal allows conformant array value parameters
91
Limitations
bull The compiler may be forced to store a value on a stack instead of registers
bull The stack may not suffice to handle some language features
92
Frame-Resident Variablesbull A variable x cannot be stored in register when
ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame
(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables
bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure
93
The Frames in Different Architectures
Pentium MIPS Sparc
InFrame(8) InFrame(0) InFrame(68)
InFrame(12) InReg(X157) InReg(X157)
InFrame(16) InReg(X158) InReg(X158)
M[sp+0]fpfp spsp sp-K
sp sp-KM[sp+K+0] r2
X157 r4X158 r5
save sp -K spM[fp+68]i0
X157i1
X158i2
g(x y z) where x escapes
x
y
z
View
Change
94
Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its
duration exceeds the duration of Pbull Example 1 Static variables in C
(own variables in Algol)void p(int x) static int y = 6 y += x
bull Example 2 Features of the C languageint f() int x return ampx
bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
87
Non-Local Transfer of Control in C
88
Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy
ndash Low overheadndash Hardware support may be available
bull LIFO policybull Not a pure stack
ndash Non local referencesndash Updated using arithmetic
89
The Frame Pointerbull The caller
ndash the calling routinebull The callee
ndash the called routinebull caller responsibilities
ndash Calculate arguments and save in the stackndash Store lexical pointer
bull call instruction M[--SP] = RA PC = callee
bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size
bull Why use both SP and FP
90
Variable Length Frame Sizebull C allows allocating objects of unbounded
size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))
bull Some versions of Pascal allows conformant array value parameters
91
Limitations
bull The compiler may be forced to store a value on a stack instead of registers
bull The stack may not suffice to handle some language features
92
Frame-Resident Variablesbull A variable x cannot be stored in register when
ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame
(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables
bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure
93
The Frames in Different Architectures
Pentium MIPS Sparc
InFrame(8) InFrame(0) InFrame(68)
InFrame(12) InReg(X157) InReg(X157)
InFrame(16) InReg(X158) InReg(X158)
M[sp+0]fpfp spsp sp-K
sp sp-KM[sp+K+0] r2
X157 r4X158 r5
save sp -K spM[fp+68]i0
X157i1
X158i2
g(x y z) where x escapes
x
y
z
View
Change
94
Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its
duration exceeds the duration of Pbull Example 1 Static variables in C
(own variables in Algol)void p(int x) static int y = 6 y += x
bull Example 2 Features of the C languageint f() int x return ampx
bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
88
Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy
ndash Low overheadndash Hardware support may be available
bull LIFO policybull Not a pure stack
ndash Non local referencesndash Updated using arithmetic
89
The Frame Pointerbull The caller
ndash the calling routinebull The callee
ndash the called routinebull caller responsibilities
ndash Calculate arguments and save in the stackndash Store lexical pointer
bull call instruction M[--SP] = RA PC = callee
bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size
bull Why use both SP and FP
90
Variable Length Frame Sizebull C allows allocating objects of unbounded
size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))
bull Some versions of Pascal allows conformant array value parameters
91
Limitations
bull The compiler may be forced to store a value on a stack instead of registers
bull The stack may not suffice to handle some language features
92
Frame-Resident Variablesbull A variable x cannot be stored in register when
ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame
(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables
bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure
93
The Frames in Different Architectures
Pentium MIPS Sparc
InFrame(8) InFrame(0) InFrame(68)
InFrame(12) InReg(X157) InReg(X157)
InFrame(16) InReg(X158) InReg(X158)
M[sp+0]fpfp spsp sp-K
sp sp-KM[sp+K+0] r2
X157 r4X158 r5
save sp -K spM[fp+68]i0
X157i1
X158i2
g(x y z) where x escapes
x
y
z
View
Change
94
Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its
duration exceeds the duration of Pbull Example 1 Static variables in C
(own variables in Algol)void p(int x) static int y = 6 y += x
bull Example 2 Features of the C languageint f() int x return ampx
bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
89
The Frame Pointerbull The caller
ndash the calling routinebull The callee
ndash the called routinebull caller responsibilities
ndash Calculate arguments and save in the stackndash Store lexical pointer
bull call instruction M[--SP] = RA PC = callee
bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size
bull Why use both SP and FP
90
Variable Length Frame Sizebull C allows allocating objects of unbounded
size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))
bull Some versions of Pascal allows conformant array value parameters
91
Limitations
bull The compiler may be forced to store a value on a stack instead of registers
bull The stack may not suffice to handle some language features
92
Frame-Resident Variablesbull A variable x cannot be stored in register when
ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame
(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables
bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure
93
The Frames in Different Architectures
Pentium MIPS Sparc
InFrame(8) InFrame(0) InFrame(68)
InFrame(12) InReg(X157) InReg(X157)
InFrame(16) InReg(X158) InReg(X158)
M[sp+0]fpfp spsp sp-K
sp sp-KM[sp+K+0] r2
X157 r4X158 r5
save sp -K spM[fp+68]i0
X157i1
X158i2
g(x y z) where x escapes
x
y
z
View
Change
94
Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its
duration exceeds the duration of Pbull Example 1 Static variables in C
(own variables in Algol)void p(int x) static int y = 6 y += x
bull Example 2 Features of the C languageint f() int x return ampx
bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
90
Variable Length Frame Sizebull C allows allocating objects of unbounded
size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))
bull Some versions of Pascal allows conformant array value parameters
91
Limitations
bull The compiler may be forced to store a value on a stack instead of registers
bull The stack may not suffice to handle some language features
92
Frame-Resident Variablesbull A variable x cannot be stored in register when
ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame
(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables
bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure
93
The Frames in Different Architectures
Pentium MIPS Sparc
InFrame(8) InFrame(0) InFrame(68)
InFrame(12) InReg(X157) InReg(X157)
InFrame(16) InReg(X158) InReg(X158)
M[sp+0]fpfp spsp sp-K
sp sp-KM[sp+K+0] r2
X157 r4X158 r5
save sp -K spM[fp+68]i0
X157i1
X158i2
g(x y z) where x escapes
x
y
z
View
Change
94
Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its
duration exceeds the duration of Pbull Example 1 Static variables in C
(own variables in Algol)void p(int x) static int y = 6 y += x
bull Example 2 Features of the C languageint f() int x return ampx
bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
91
Limitations
bull The compiler may be forced to store a value on a stack instead of registers
bull The stack may not suffice to handle some language features
92
Frame-Resident Variablesbull A variable x cannot be stored in register when
ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame
(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables
bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure
93
The Frames in Different Architectures
Pentium MIPS Sparc
InFrame(8) InFrame(0) InFrame(68)
InFrame(12) InReg(X157) InReg(X157)
InFrame(16) InReg(X158) InReg(X158)
M[sp+0]fpfp spsp sp-K
sp sp-KM[sp+K+0] r2
X157 r4X158 r5
save sp -K spM[fp+68]i0
X157i1
X158i2
g(x y z) where x escapes
x
y
z
View
Change
94
Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its
duration exceeds the duration of Pbull Example 1 Static variables in C
(own variables in Algol)void p(int x) static int y = 6 y += x
bull Example 2 Features of the C languageint f() int x return ampx
bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
92
Frame-Resident Variablesbull A variable x cannot be stored in register when
ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame
(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables
bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure
93
The Frames in Different Architectures
Pentium MIPS Sparc
InFrame(8) InFrame(0) InFrame(68)
InFrame(12) InReg(X157) InReg(X157)
InFrame(16) InReg(X158) InReg(X158)
M[sp+0]fpfp spsp sp-K
sp sp-KM[sp+K+0] r2
X157 r4X158 r5
save sp -K spM[fp+68]i0
X157i1
X158i2
g(x y z) where x escapes
x
y
z
View
Change
94
Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its
duration exceeds the duration of Pbull Example 1 Static variables in C
(own variables in Algol)void p(int x) static int y = 6 y += x
bull Example 2 Features of the C languageint f() int x return ampx
bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
93
The Frames in Different Architectures
Pentium MIPS Sparc
InFrame(8) InFrame(0) InFrame(68)
InFrame(12) InReg(X157) InReg(X157)
InFrame(16) InReg(X158) InReg(X158)
M[sp+0]fpfp spsp sp-K
sp sp-KM[sp+K+0] r2
X157 r4X158 r5
save sp -K spM[fp+68]i0
X157i1
X158i2
g(x y z) where x escapes
x
y
z
View
Change
94
Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its
duration exceeds the duration of Pbull Example 1 Static variables in C
(own variables in Algol)void p(int x) static int y = 6 y += x
bull Example 2 Features of the C languageint f() int x return ampx
bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
94
Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its
duration exceeds the duration of Pbull Example 1 Static variables in C
(own variables in Algol)void p(int x) static int y = 6 y += x
bull Example 2 Features of the C languageint f() int x return ampx
bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
95
Compiler Implementation
bull Hide machine dependent partsbull Hide language dependent partbull Use special modules
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
96
Basic Compiler Phases
Source program (string)
EXE
lexical analysis
syntax analysis
semantic analysis
Code generation
AssemblerLinker
Tokens
Abstract syntax tree
Assembly
Frame managerControl Flow Graph
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
97
Hidden in the frame ADT
bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-
of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
98
Invocations to Frame
bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
99
Activation Records Summary
bull compile time memory management for procedure data
bull works well for data with well-scoped lifetimendash deallocation when procedure returns
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-
100
The End
- Compilation 0368-3133 (Semester A 201314)
- What is a Compiler
- Conceptual Structure of a Compiler
- From scanning to parsing
- Context Analysis
- Code Generation
- Code Generation IR
- Slide 8
- Three-Address Code IR
- Abstract Register Machine
- Abstract Activation Record Stack
- Abstract Stack Frame
- TAC generation for expressions
- cgen example
- Naiumlve cgen for expressions
- TAC Generation for Control Flow Statements
- Control-flow example ndash conditions
- cgen for if-then-else
- Naive cgen for expressions
- Improving cgen for expressions
- Sethi-Ullman translation
- Weighted register allocation
- Weighted reg alloc example
- Weighted reg alloc example (2)
- TAC Generation for Memory Access Instructions
- Array operations
- TAC Generation for Control Flow Statements (2)
- Control-flow example ndash conditions (2)
- cgen for if-then-else (2)
- Control-flow example ndash loops
- cgen for while loops
- Optimization points
- Interprocedural IR
- Supporting Procedures
- Abstract Register Machine (High Level View)
- Abstract Register Machine (High Level View) (2)
- Abstract Activation Record Stack (2)
- Abstract Stack Frame (2)
- Handling Procedures
- Abstract Register Machine (2)
- Semi-Abstract Register Machine
- Intro Functions Example
- What Can We Do with Procedures
- Design Decisions
- Static (lexical) Scoping
- Dynamic Scoping
- Example
- Why do we care
- Variable addresses for static scoping first attempt
- Variable addresses for static scoping first attempt (2)
- Compile-Time Information on Variables
- Activation Record (Stack Frames)
- Semi-Abstract Register Machine (2)
- A Logical Stack Frame (Simplified)
- Runtime Stack
- Activation Record (frame)
- Runtime Stack (2)
- Code Blocks
- L-Values of Local Variables
- Pentium Runtime Stack
- Accessing Stack Variables
- Factorial ndash fact(int n)
- Call Sequences
- Call Sequences (2)
- Call Sequences ndash Foo(4221)
- ldquoTo Callee-save or to Caller-saverdquo
- Caller-Save and Callee-Save Registers
- Callee-Save Registers
- Caller-Save Registers
- Parameter Passing
- Windows Exploit(s) Buffer Overflow
- Buffer overflow
- Buffer overflow (2)
- Buffer overflow (3)
- Buffer overflow (4)
- Nested Procedures
- Example Nested Procedures
- Nested Procedures (2)
- Nested Procedures (3)
- Nested Procedures (4)
- Lexical Pointers
- What is a Compiler (2)
- Procedure Q Calls R
- Activation Records Remarks
- Non-Local goto in C syntax
- Non-local gotos in C
- Non-Local Transfer of Control in C
- Stack Frames
- The Frame Pointer
- Variable Length Frame Size
- Limitations
- Frame-Resident Variables
- The Frames in Different Architectures
- Limitations of Stack Frames
- Compiler Implementation
- Basic Compiler Phases
- Hidden in the frame ADT
- Invocations to Frame
- Activation Records Summary
- The End
-