compilation 0368-3133 (semester a, 2013/14)

100
Compilation 0368-3133 (Semester A, 2013/14) Lecture 8: Activation Records + Register Allocation Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv and Eran Yahav 1

Upload: meris

Post on 10-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Compilation 0368-3133 (Semester A, 2013/14). Lecture 8: Activation Records + Register Allocation Noam Rinetzky. Slides credit: Roman Manevich , Mooly Sagiv and Eran Yahav. What is a Compiler?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Compilation   0368-3133 (Semester A, 2013/14)

1

Compilation 0368-3133 (Semester A 201314)

Lecture 8 Activation Records + Register Allocation

Noam Rinetzky

Slides credit Roman Manevich Mooly Sagiv and Eran Yahav

2

What is a Compiler

ldquoA compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language)

The most common reason for wanting to transform source code is to create an executable programrdquo

--Wikipedia

3

Conceptual Structure of a Compiler

Executable code

exe

Sourcetext

txtSemantic

RepresentationBackend

CompilerFrontend

LexicalAnalysi

s

Syntax Analysi

sParsing

Semantic

Analysis

IntermediateRepresentati

on(IR)

CodeGeneration

4

From scanning to parsing((23 + 7) x)

) x ) 7 + 23 ( (RP Id OP RP Num OP Num LP LP

Lexical Analyzer

program text

token stream

ParserGrammar E | Id Id lsquoarsquo | | lsquozrsquo

Op()

Id(b)

Num(23) Num(7)

Op(+)

Abstract Syntax Tree

validsyntaxerror

5

Context Analysis

Op()

Id(b)

Num(23) Num(7)

Op(+)

Abstract Syntax Tree

E1 int E2 int

E1 + E2 int

Type rules

Semantic Error Valid + Symbol Table

6

Code Generation

Op()

Id(b)

Num(23) Num(7)

Op(+)

Valid Abstract Syntax TreeSymbol Tablehellip

Verification (possible runtime)ErrorsWarnings

Intermediate Representation (IR)

Executable Codeinput output

7

Code Generation IR

bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code

bull Primitive statements control flow procedure calls

Source code

(program)

LexicalAnalysis

Syntax Analysis

Parsing

AST SymbolTableetc

InterRep(IR)

CodeGeneration

Source code

(executable)

8

C++ IR

Pentium

Java bytecode

SparcPyhton

Javaoptimize

Lowering Code Gen

Intermediate representationbull A language that is between the source language and

the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for

different source languagestarget machinesbull Goal 2 machine-independent optimizer

ndash Narrow interface small number of node types (instructions)

9

Three-Address Code IR

bull A popular form of IRbull High-level assembly where instructions

have at most three operands

Chapter 8

10

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

11

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

12

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 2: Compilation   0368-3133 (Semester A, 2013/14)

2

What is a Compiler

ldquoA compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language)

The most common reason for wanting to transform source code is to create an executable programrdquo

--Wikipedia

3

Conceptual Structure of a Compiler

Executable code

exe

Sourcetext

txtSemantic

RepresentationBackend

CompilerFrontend

LexicalAnalysi

s

Syntax Analysi

sParsing

Semantic

Analysis

IntermediateRepresentati

on(IR)

CodeGeneration

4

From scanning to parsing((23 + 7) x)

) x ) 7 + 23 ( (RP Id OP RP Num OP Num LP LP

Lexical Analyzer

program text

token stream

ParserGrammar E | Id Id lsquoarsquo | | lsquozrsquo

Op()

Id(b)

Num(23) Num(7)

Op(+)

Abstract Syntax Tree

validsyntaxerror

5

Context Analysis

Op()

Id(b)

Num(23) Num(7)

Op(+)

Abstract Syntax Tree

E1 int E2 int

E1 + E2 int

Type rules

Semantic Error Valid + Symbol Table

6

Code Generation

Op()

Id(b)

Num(23) Num(7)

Op(+)

Valid Abstract Syntax TreeSymbol Tablehellip

Verification (possible runtime)ErrorsWarnings

Intermediate Representation (IR)

Executable Codeinput output

7

Code Generation IR

bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code

bull Primitive statements control flow procedure calls

Source code

(program)

LexicalAnalysis

Syntax Analysis

Parsing

AST SymbolTableetc

InterRep(IR)

CodeGeneration

Source code

(executable)

8

C++ IR

Pentium

Java bytecode

SparcPyhton

Javaoptimize

Lowering Code Gen

Intermediate representationbull A language that is between the source language and

the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for

different source languagestarget machinesbull Goal 2 machine-independent optimizer

ndash Narrow interface small number of node types (instructions)

9

Three-Address Code IR

bull A popular form of IRbull High-level assembly where instructions

have at most three operands

Chapter 8

10

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

11

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

12

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 3: Compilation   0368-3133 (Semester A, 2013/14)

3

Conceptual Structure of a Compiler

Executable code

exe

Sourcetext

txtSemantic

RepresentationBackend

CompilerFrontend

LexicalAnalysi

s

Syntax Analysi

sParsing

Semantic

Analysis

IntermediateRepresentati

on(IR)

CodeGeneration

4

From scanning to parsing((23 + 7) x)

) x ) 7 + 23 ( (RP Id OP RP Num OP Num LP LP

Lexical Analyzer

program text

token stream

ParserGrammar E | Id Id lsquoarsquo | | lsquozrsquo

Op()

Id(b)

Num(23) Num(7)

Op(+)

Abstract Syntax Tree

validsyntaxerror

5

Context Analysis

Op()

Id(b)

Num(23) Num(7)

Op(+)

Abstract Syntax Tree

E1 int E2 int

E1 + E2 int

Type rules

Semantic Error Valid + Symbol Table

6

Code Generation

Op()

Id(b)

Num(23) Num(7)

Op(+)

Valid Abstract Syntax TreeSymbol Tablehellip

Verification (possible runtime)ErrorsWarnings

Intermediate Representation (IR)

Executable Codeinput output

7

Code Generation IR

bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code

bull Primitive statements control flow procedure calls

Source code

(program)

LexicalAnalysis

Syntax Analysis

Parsing

AST SymbolTableetc

InterRep(IR)

CodeGeneration

Source code

(executable)

8

C++ IR

Pentium

Java bytecode

SparcPyhton

Javaoptimize

Lowering Code Gen

Intermediate representationbull A language that is between the source language and

the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for

different source languagestarget machinesbull Goal 2 machine-independent optimizer

ndash Narrow interface small number of node types (instructions)

9

Three-Address Code IR

bull A popular form of IRbull High-level assembly where instructions

have at most three operands

Chapter 8

10

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

11

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

12

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 4: Compilation   0368-3133 (Semester A, 2013/14)

4

From scanning to parsing((23 + 7) x)

) x ) 7 + 23 ( (RP Id OP RP Num OP Num LP LP

Lexical Analyzer

program text

token stream

ParserGrammar E | Id Id lsquoarsquo | | lsquozrsquo

Op()

Id(b)

Num(23) Num(7)

Op(+)

Abstract Syntax Tree

validsyntaxerror

5

Context Analysis

Op()

Id(b)

Num(23) Num(7)

Op(+)

Abstract Syntax Tree

E1 int E2 int

E1 + E2 int

Type rules

Semantic Error Valid + Symbol Table

6

Code Generation

Op()

Id(b)

Num(23) Num(7)

Op(+)

Valid Abstract Syntax TreeSymbol Tablehellip

Verification (possible runtime)ErrorsWarnings

Intermediate Representation (IR)

Executable Codeinput output

7

Code Generation IR

bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code

bull Primitive statements control flow procedure calls

Source code

(program)

LexicalAnalysis

Syntax Analysis

Parsing

AST SymbolTableetc

InterRep(IR)

CodeGeneration

Source code

(executable)

8

C++ IR

Pentium

Java bytecode

SparcPyhton

Javaoptimize

Lowering Code Gen

Intermediate representationbull A language that is between the source language and

the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for

different source languagestarget machinesbull Goal 2 machine-independent optimizer

ndash Narrow interface small number of node types (instructions)

9

Three-Address Code IR

bull A popular form of IRbull High-level assembly where instructions

have at most three operands

Chapter 8

10

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

11

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

12

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 5: Compilation   0368-3133 (Semester A, 2013/14)

5

Context Analysis

Op()

Id(b)

Num(23) Num(7)

Op(+)

Abstract Syntax Tree

E1 int E2 int

E1 + E2 int

Type rules

Semantic Error Valid + Symbol Table

6

Code Generation

Op()

Id(b)

Num(23) Num(7)

Op(+)

Valid Abstract Syntax TreeSymbol Tablehellip

Verification (possible runtime)ErrorsWarnings

Intermediate Representation (IR)

Executable Codeinput output

7

Code Generation IR

bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code

bull Primitive statements control flow procedure calls

Source code

(program)

LexicalAnalysis

Syntax Analysis

Parsing

AST SymbolTableetc

InterRep(IR)

CodeGeneration

Source code

(executable)

8

C++ IR

Pentium

Java bytecode

SparcPyhton

Javaoptimize

Lowering Code Gen

Intermediate representationbull A language that is between the source language and

the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for

different source languagestarget machinesbull Goal 2 machine-independent optimizer

ndash Narrow interface small number of node types (instructions)

9

Three-Address Code IR

bull A popular form of IRbull High-level assembly where instructions

have at most three operands

Chapter 8

10

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

11

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

12

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 6: Compilation   0368-3133 (Semester A, 2013/14)

6

Code Generation

Op()

Id(b)

Num(23) Num(7)

Op(+)

Valid Abstract Syntax TreeSymbol Tablehellip

Verification (possible runtime)ErrorsWarnings

Intermediate Representation (IR)

Executable Codeinput output

7

Code Generation IR

bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code

bull Primitive statements control flow procedure calls

Source code

(program)

LexicalAnalysis

Syntax Analysis

Parsing

AST SymbolTableetc

InterRep(IR)

CodeGeneration

Source code

(executable)

8

C++ IR

Pentium

Java bytecode

SparcPyhton

Javaoptimize

Lowering Code Gen

Intermediate representationbull A language that is between the source language and

the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for

different source languagestarget machinesbull Goal 2 machine-independent optimizer

ndash Narrow interface small number of node types (instructions)

9

Three-Address Code IR

bull A popular form of IRbull High-level assembly where instructions

have at most three operands

Chapter 8

10

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

11

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

12

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 7: Compilation   0368-3133 (Semester A, 2013/14)

7

Code Generation IR

bull Translating from abstract syntax (AST) to intermediate representation (IR)ndash Three-Address Code

bull Primitive statements control flow procedure calls

Source code

(program)

LexicalAnalysis

Syntax Analysis

Parsing

AST SymbolTableetc

InterRep(IR)

CodeGeneration

Source code

(executable)

8

C++ IR

Pentium

Java bytecode

SparcPyhton

Javaoptimize

Lowering Code Gen

Intermediate representationbull A language that is between the source language and

the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for

different source languagestarget machinesbull Goal 2 machine-independent optimizer

ndash Narrow interface small number of node types (instructions)

9

Three-Address Code IR

bull A popular form of IRbull High-level assembly where instructions

have at most three operands

Chapter 8

10

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

11

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

12

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 8: Compilation   0368-3133 (Semester A, 2013/14)

8

C++ IR

Pentium

Java bytecode

SparcPyhton

Javaoptimize

Lowering Code Gen

Intermediate representationbull A language that is between the source language and

the target language ndash not specific to any machinebull Goal 1 retargeting compiler components for

different source languagestarget machinesbull Goal 2 machine-independent optimizer

ndash Narrow interface small number of node types (instructions)

9

Three-Address Code IR

bull A popular form of IRbull High-level assembly where instructions

have at most three operands

Chapter 8

10

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

11

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

12

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 9: Compilation   0368-3133 (Semester A, 2013/14)

9

Three-Address Code IR

bull A popular form of IRbull High-level assembly where instructions

have at most three operands

Chapter 8

10

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

11

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

12

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 10: Compilation   0368-3133 (Semester A, 2013/14)

10

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

11

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

12

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 11: Compilation   0368-3133 (Semester A, 2013/14)

11

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

12

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 12: Compilation   0368-3133 (Semester A, 2013/14)

12

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 13: Compilation   0368-3133 (Semester A, 2013/14)

13

TAC generation for expressions

bull cgen(atomic_expr) directly generates TAC for atomic expressionsndash Constants identifiershellip

bull Cgen(compund_expr) recursively generates TAC for compound expressions ndash binary operators procedure calls hellipndash use temporary variables (registers) to store

values of intermediate expressions

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 14: Compilation   0368-3133 (Semester A, 2013/14)

14

cgen examplecgen(5 + x) = Choose a new temporary t Let t1 = cgen(5) Let t2 = cgen(x) Emit( t = t1 + t2 ) Return t

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 15: Compilation   0368-3133 (Semester A, 2013/14)

15

Naiumlve cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 16: Compilation   0368-3133 (Semester A, 2013/14)

16

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 17: Compilation   0368-3133 (Semester A, 2013/14)

17

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 18: Compilation   0368-3133 (Semester A, 2013/14)

18

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 19: Compilation   0368-3133 (Semester A, 2013/14)

19

Naive cgen for expressionsbull Maintain a counter for temporaries in cbull Initially c = 0bull cgen(e1 op e2) =

Let A = cgen(e1) c = c + 1 Let B = cgen(e2) c = c + 1 Emit( _tc = A op B ) Return _tc

bull Observation temporaries in cgen(e1) can be reused in cgen(e2)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 20: Compilation   0368-3133 (Semester A, 2013/14)

20

Improving cgen for expressionsbull Observation ndash naiumlve translation needlessly generates temporaries for

leaf expressionsbull Observation ndash temporaries used exactly once

ndash Once a temporary has been read it can be reused for another sub-expressionbull cgen(e1 op e2) =

Let _t1 = cgen(e1) Let _t2 = cgen(e2) Emit( _t =_t1 op _t2 ) Return t

bull Temporaries cgen(e1) can be reused in cgen(e2)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 21: Compilation   0368-3133 (Semester A, 2013/14)

21

Sethi-Ullman translation

bull Algorithm by Ravi Sethi and Jeffrey D Ullman to emit optimal TACndash Minimizes number of temporaries

bull Main data structure in algorithm is a stack of temporariesndash Stack corresponds to recursive invocations of _t = cgen(e)ndash All the temporaries on the stack are live

bull Live = contain a value that is needed later on

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 22: Compilation   0368-3133 (Semester A, 2013/14)

22

Weighted register allocationbull Can save registers by re-ordering subtree computationsbull Label each node with its weight

ndash Weight = number of registers neededndash Leaf weight knownndash Internal node weight

bull w(left) gt w(right) then w = leftbull w(right) gt w(left) then w = rightbull w(right) = w(left) then w = left + 1

bull Choose heavier child as first to be translatedbull WARNING have to check that no side-effects exist before

attempting to apply this optimization(pre-pass on the tree)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 23: Compilation   0368-3133 (Semester A, 2013/14)

23

Weighted reg alloc example

b

5 c

array access

+

a w=0

w=0 w=0

w=1 w=0

w=1

w=1

Phase 1 - check absence of side-effects in expression tree - assign weight to each AST node

_t0 = cgen( a+b[5c] )

base index

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 24: Compilation   0368-3133 (Semester A, 2013/14)

24

Weighted reg alloc example

b

5 c

array access

+

abase index

w=0

w=0 w=0

w=1 w=0

w=1

w=1

_t0 = cgen( a+b[5c] )Phase 2 - use weights to decide on order of translation

_t0

_t0

_t0Heavier sub-tree

Heavier sub-tree

_t0 = _t1 _t0

_t0 = _t1[_t0]

_t0 = _t1 + _t0

_t0_t1

_t1

_t1_t0 = c_t1 = 5

_t1 = b

_t1 = a

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 25: Compilation   0368-3133 (Semester A, 2013/14)

25

TAC Generation forMemory Access Instructions

bull Copy instruction a = bbull Loadstore instructions

a = b a = bbull Address of instruction a=ampbbull Array accesses

a = b[i] a[i] = bbull Field accesses

a = b[f] a[f] = bbull Memory allocation instruction a = alloc(size)

ndash Sometimes left out (eg malloc is a procedure in C)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 26: Compilation   0368-3133 (Semester A, 2013/14)

26

Array operations

t1 = ampy t1 = address-of yt2 = t1 + i t2 = address of y[i]x = t2 value stored at y[i]

t1 = ampx t1 = address-of xt2 = t1 + i t2 = address of x[i]t2 = y store through pointer

x = y[i]

x[i] = y

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 27: Compilation   0368-3133 (Semester A, 2013/14)

27

TAC Generation forControl Flow Statements

bull Label introduction_label_name

Indicates a point in the code that can be jumped tobull Unconditional jump go to instruction following label L

Goto Lbull Conditional jump test condition variable t

if 0 jump to label LIfZ t Goto L

bull Similarly test condition variable tif 1 jump to label L

IfNZ t Goto L

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 28: Compilation   0368-3133 (Semester A, 2013/14)

28

Control-flow example ndash conditionsint xint yint z

if (x lt y) z = xelse z = yz = z z

_t0 = x lt yIfZ _t0 Goto _L0z = xGoto _L1_L0z = y_L1z = z z

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 29: Compilation   0368-3133 (Semester A, 2013/14)

29

cgen for if-then-elsecgen(if (e) s1 else s2) Let _t = cgen(e)

Let Lfalse be a new labelLet Lafter be a new labelEmit( IfZ _t Goto Lfalse )cgen(s1)Emit( Goto Lafter )Emit( Lfalse )cgen(s2)Emit( Goto Lafter)Emit( Lafter )

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 30: Compilation   0368-3133 (Semester A, 2013/14)

30

Control-flow example ndash loopsint xint y

while (x lt y) x = x 2

y = x

_L0_t0 = x lt yIfZ _t0 Goto _L1x = x 2Goto _L0_L1

y = x

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 31: Compilation   0368-3133 (Semester A, 2013/14)

31

cgen for while loopscgen(while (expr) stmt) Let Lbefore be a new label

Let Lafter be a new labelEmit( Lbefore )Let t = cgen(expr)Emit( IfZ t Goto Lafter )cgen(stmt)Emit( Goto Lbefore )Emit( Lafter )

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 32: Compilation   0368-3133 (Semester A, 2013/14)

32

Optimization points

sourcecode

Frontend IR Code

generatortargetcode

Userprofile program

change algorithm

Compilerintraprocedural IRInterprocedural IRIR optimizations

Compilerregister allocation

instruction selectionpeephole transformations

today

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 33: Compilation   0368-3133 (Semester A, 2013/14)

33

Interprocedural IR

bull Compile time generation of code for procedure invocations

bull Activation Records (aka Stack Frames)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 34: Compilation   0368-3133 (Semester A, 2013/14)

34

Supporting Procedures

bull New computing environment ndash at least temporary memory for local variables

bull Passing information into the new environmentndash Parameters

bull Transfer of control tofrom procedurebull Handling return values

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 35: Compilation   0368-3133 (Semester A, 2013/14)

35

Abstract Register Machine(High Level View)

Code

Data

Highaddresses

Lowaddresses

CPU

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 36: Compilation   0368-3133 (Semester A, 2013/14)

36

Heap

Abstract Register Machine(High Level View)

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 37: Compilation   0368-3133 (Semester A, 2013/14)

37

Abstract Activation Record Stack

Stack frame for procedure

Prock+1(a1hellipaN)

Prock

Prock+2

hellip

hellip

Prock+1

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 38: Compilation   0368-3133 (Semester A, 2013/14)

38

Abstract Stack Frame

Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Prock

Prock+2

hellip

hellip

Stack frame for procedure

Prock+1(a1hellipaN)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 39: Compilation   0368-3133 (Semester A, 2013/14)

39

Handling Proceduresbull Store local variablestemporaries in a stackbull A function call instruction pushes arguments to

stack and jumps to the function labelA statement x=f(a1hellipan) looks like

Push a1 hellip Push anCall fPop x copy returned value

bull Returning a value is done by pushing it to the stack (return x)

Push xbull Return control to caller (and roll up stack)

Return

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 40: Compilation   0368-3133 (Semester A, 2013/14)

40

Heap

Abstract Register Machine

Global Variables

Stack

Lowaddresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

Code

Highaddresses

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 41: Compilation   0368-3133 (Semester A, 2013/14)

41

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 42: Compilation   0368-3133 (Semester A, 2013/14)

42

Intro Functions Exampleint SimpleFn(int z)

int x yx = x y zreturn x

void main() int ww = SimpleFunction(137)

_SimpleFn_t0 = x y_t1 = _t0 zx = _t1Push xReturn

main_t0 = 137Push _t0Call _SimpleFnPop w

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 43: Compilation   0368-3133 (Semester A, 2013/14)

43

What Can We Do with Procedures

bull Declarations amp Definitionsbull Call amp Returnbull Jumping out of proceduresbull Passing amp Returning procedures as

parameters

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 44: Compilation   0368-3133 (Semester A, 2013/14)

44

Design Decisions

bull Scoping rulesndash Static scoping vs dynamic scoping

bull Callercallee conventionsndash Parametersndash Who saves register values

bull Allocating space for local variables

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 45: Compilation   0368-3133 (Semester A, 2013/14)

45

Static (lexical) Scopingmain ( )

int a = 0 int b = 0

int b = 1

int a = 2 printf (ldquod dnrdquo a b)

int b = 3 printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

printf (ldquod dnrdquo a b)

B0B1

B3B3

B2

Declaration Scopes

a=0 B0B1B3

b=0 B0

b=1 B1B2

a=2 B2

b=3 B3

a name refers to its (closest)

enclosing scope

known at compile time

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 46: Compilation   0368-3133 (Semester A, 2013/14)

46

Dynamic Scoping

bull Each identifier is associated with a global stack of bindings

bull When entering scope where identifier is declaredndash push declaration on identifier stack

bull When exiting scope where identifier is declaredndash pop identifier stack

bull Evaluating the identifier in any context binds to the current top of stack

bull Determined at runtime

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 47: Compilation   0368-3133 (Semester A, 2013/14)

47

Example

bull What value is returned from mainbull Static scopingbull Dynamic scoping

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 48: Compilation   0368-3133 (Semester A, 2013/14)

48

Why do we care

bull We need to generate code to access variables

bull Static scopingndash Identifier binding is known at compile timendash Address of the variable is known at compile timendash Assigning addresses to variables is part of code

generationndash No runtime errors of ldquoaccess to undefined variablerdquondash Can check types of variables

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 49: Compilation   0368-3133 (Semester A, 2013/14)

49

Variable addresses for static scoping first attempt

int x = 42

int f() return x int g() int x = 1 return f() int main() return g()

identifier address

x (global) 0x42

x (inside g) 0x73

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 50: Compilation   0368-3133 (Semester A, 2013/14)

50

Variable addresses for static scoping first attempt

int a [11]

void quicksort(int m int n) int i if (n gt m) i = partition(m n) quicksort (m i-1) quicksort (i+1 n)

main() quicksort (1 9)

what is the address of the variable ldquoirdquo in

the procedure quicksort

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 51: Compilation   0368-3133 (Semester A, 2013/14)

51

Compile-Time Information on Variablesbull Namebull Typebull Scope

ndash when is it recognizedbull Duration

ndash Until when does its value existbull Size

ndash How many bytes are required at runtime bull Address

ndash Fixedndash Relativendash Dynamic

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 52: Compilation   0368-3133 (Semester A, 2013/14)

52

Activation Record (Stack Frames)bull separate space for each procedure invocation

bull managed at runtimendash code for managing it generated by the compiler

bull desired properties ndash efficient allocation and deallocation

bull procedures are called frequentlyndash variable size

bull different procedures may require different memory sizes

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 53: Compilation   0368-3133 (Semester A, 2013/14)

53

Semi-Abstract Register Machine

Global Variables

Stack

Heap

High addresses

Low addresses

CPU Main Memory

Register 00

Register 01

Register xx

hellip

Register PC

Cont

rol

regi

ster

s (d

ata)

regi

ster

sGe

nera

l pur

pose

hellip

ebp

esphellipre

gist

ers

Stac

k

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 54: Compilation   0368-3133 (Semester A, 2013/14)

54

A Logical Stack Frame (Simplified)Param NParam N-1

hellipParam 1_t0

hellip_tkx

hellipy

Parameters(actual

arguments)

Locals and temporaries

Stack frame for function f(a1hellipaN)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 55: Compilation   0368-3133 (Semester A, 2013/14)

55

Runtime Stack

bull Stack of activation recordsbull Call = push new activation recordbull Return = pop activation recordbull Only one ldquoactiverdquo activation record ndash top of

stackbull How do we handle recursion

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 56: Compilation   0368-3133 (Semester A, 2013/14)

56

Activation Record (frame)parameter k

parameter 1

lexical pointer

return information

dynamic link

registers amp misc

local variablestemporaries

next frame would be here

hellip

administrativepart

high addresses

lowaddresses

framepointer

stackpointer

incoming parameters

stack grows down

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 57: Compilation   0368-3133 (Semester A, 2013/14)

57

Runtime Stackbull SP ndash stack pointer

ndash top of current framebull FP ndash frame pointer

ndash base of current framendash Sometimes called BP

(base pointer)

Current frame

hellip hellip

Previous frame

SP

FPstack grows down

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 58: Compilation   0368-3133 (Semester A, 2013/14)

58

Code Blocks

bull Programming language provide code blocks void foo() int x = 8 y=91 int x = y y 2 int x = y 7 3 x = y + 1

adminstrativex1

y1

x2

x3

hellip

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 59: Compilation   0368-3133 (Semester A, 2013/14)

59

L-Values of Local Variables

bull The offset in the stack is known at compile time

bull L-val(x) = FP+offset(x)bull x = 5 Load_Constant 5 R3

Store R3 offset(x)(FP)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 60: Compilation   0368-3133 (Semester A, 2013/14)

60

Pentium Runtime Stack

Pentium stack registers

Pentium stack and callret instructions

Register Usage

ESP Stack pointer

EBP Base pointer

Instruction Usage

push pushahellip push on runtime stack

poppopahellip Base pointer

call transfer control to called routine

return transfer control back to caller

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 61: Compilation   0368-3133 (Semester A, 2013/14)

61

Accessing Stack Variables

bull Use offset from FP (ebp)bull Remember ndash

stack grows downwardsbull Above FP = parametersbull Below FP = localsbull Examples

ndash ebp + 4 = return addressndash ebp + 8 = first parameterndash ebp ndash 4 = first local

hellip hellip

SP

FPReturn address

Return address

Param nhellip

param1

Local 1hellip

Local n

Previous fp

Param nhellip

param1FP+8

FP-4

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 62: Compilation   0368-3133 (Semester A, 2013/14)

62

Factorial ndash fact(int n)factpushl ebp save ebpmovl espebp ebp=esppushl ebx save ebxmovl 8(ebp)ebx ebx = ncmpl $1ebx n = 1 jle lresult then doneleal -1(ebx)eax eax = n-1pushl eax call fact fact(n-1)imull ebxeax eax=retvnjmp lreturn lresultmovl $1eax retvlreturnmovl -4(ebp)ebx restore ebxmovl ebpesp restore esppopl ebp restore ebp

ESP

EBPReturn address

Return address

old ebx

Previous fp

nEBP+8

EBP-4 old ebp

old ebx

(stack in intermediate point)

(disclaimer real compiler can do better than that)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 63: Compilation   0368-3133 (Semester A, 2013/14)

63

Call Sequences

bull The processor does not save the content of registers on procedure calls

bull So who will ndash Caller saves and restores registersndash Callee saves and restores registersndash But can also have both saverestore some

registers

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 64: Compilation   0368-3133 (Semester A, 2013/14)

64

Call Sequences

call

caller

callee

return

caller

Caller push code

Callee push code

(prologue)

Callee pop code

(epilogue)

Caller pop code

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop parametersPop caller-save registers

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 65: Compilation   0368-3133 (Semester A, 2013/14)

65

Call Sequences ndash Foo(4221)

call

caller

callee

return

caller

push ecx

push $21

push $42

call _foo

push ebp

mov esp ebp

sub 8 esp

push ebx

pop ebx

mov ebp esp

pop ebp

ret

add $8 esp

pop ecx

Push caller-save registersPush actual parameters (in reverse order)

push return addressJump to call address

Push current base-pointerbp = sp

Push local variablesPush callee-save registers

Pop callee-save registersPop callee activation record

Pop old base-pointer

pop return addressJump to address

Pop caller-save registersPop parameters

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 66: Compilation   0368-3133 (Semester A, 2013/14)

66

ldquoTo Callee-save or to Caller-saverdquo

bull Callee-saved registers need only be saved when callee modifies their value

bull some heuristics and conventions are followed

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 67: Compilation   0368-3133 (Semester A, 2013/14)

67

Caller-Save and Callee-Save Registersbull Callee-Save Registers

ndash Saved by the callee before modificationndash Values are automatically preserved across calls

bull Caller-Save Registers ndash Saved (if needed) by the caller before callsndash Values are not automatically preserved across calls

bull Usually the architecture defines caller-save and callee-save registers

bull Separate compilationbull Interoperability between code produced by different

compilerslanguages bull But compiler writers decide when to use calllercallee

registers

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 68: Compilation   0368-3133 (Semester A, 2013/14)

68

Callee-Save Registersbull Saved by the callee before modificationbull Usually at procedure prologbull Restored at procedure epilogbull Hardware support may be available bull Values are automatically preserved across calls

global _foo

Add_Constant -K SP allocate space for foo

Store_Local R5 -14(FP) save R5

Load_Reg R5 R0 Add_Constant R5 1

JSR f1 JSR g1

Add_Constant R5 2 Load_Reg R5 R0

Load_Local -14(FP) R5 restore R5

Add_Constant K SP RTS deallocate

int foo(int a) int b=a+1 f1()g1(b)

return(b+2)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 69: Compilation   0368-3133 (Semester A, 2013/14)

69

Caller-Save Registersbull Saved by the caller before calls when

neededbull Values are not automatically preserved

across calls

global _bar

Add_Constant -K SP allocate space for bar

Add_Constant R0 1

JSR f2

Load_Constant 2 R0 JSR g2

Load_Constant 8 R0 JSR g2

Add_Constant K SP deallocate space for bar

RTS

void bar (int y) int x=y+1f2(x)

g2(2) g2(8)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 70: Compilation   0368-3133 (Semester A, 2013/14)

70

Parameter Passingbull 1960s

ndash In memory bull No recursion is allowed

bull 1970sndash In stack

bull 1980sndash In registersndash First k parameters are passed in registers (k=4 or k=6)ndash Where is time saved

bull Most procedures are leaf proceduresbull Interprocedural register allocationbull Many of the registers may be dead before another invocationbull Register windows are allocated in some architectures per call (eg sun Sparc)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 71: Compilation   0368-3133 (Semester A, 2013/14)

71

Windows Exploit(s) Buffer Overflow

void foo (char x) char buf[2] strcpy(buf x) int main (int argc char argv[]) foo(argv[1])

aout abracadabraSegmentation fault

Stack grows this way

Memory addresses

Previous frameReturn

addressSaved FPchar xbuf[2]

hellip

abracadabr

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 72: Compilation   0368-3133 (Semester A, 2013/14)

72

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 73: Compilation   0368-3133 (Semester A, 2013/14)

73

Buffer overflow int check_authentication(char password) int auth_flag = 0 char password_buffer[16]

strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 74: Compilation   0368-3133 (Semester A, 2013/14)

74

Buffer overflow int check_authentication(char password) char password_buffer[16] int auth_flag = 0 strcpy(password_buffer password) if(strcmp(password_buffer brillig) == 0)

auth_flag = 1 if(strcmp(password_buffer outgrabe) == 0)

auth_flag = 1 return auth_flagint main(int argc char argv[]) if(argc lt 2) printf(Usage s ltpasswordgtn argv[0]) exit(0) if(check_authentication(argv[1])) printf(n-=-=-=-=-=-=-=-=-=-=-=-=-=-n) printf( Access Grantedn) printf(-=-=-=-=-=-=-=-=-=-=-=-=-=-n) else printf(nAccess Deniedn)

(source ldquohacking ndash the art of exploitation 2nd Edrdquo)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 75: Compilation   0368-3133 (Semester A, 2013/14)

75

Buffer overflow

0x08048529 lt+69gt movl $0x8048647(esp) 0x08048530 lt+76gt call 0x8048394 ltputspltgt 0x08048535 lt+81gt movl $0x8048664(esp) 0x0804853c lt+88gt call 0x8048394 ltputspltgt 0x08048541 lt+93gt movl $0x804867a(esp) 0x08048548 lt+100gt call 0x8048394 ltputspltgt 0x0804854d lt+105gt jmp 0x804855b ltmain+119gt 0x0804854f lt+107gt movl $0x8048696(esp) 0x08048556 lt+114gt call 0x8048394 ltputspltgt

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 76: Compilation   0368-3133 (Semester A, 2013/14)

76

Nested Procedures

bull For example ndash Pascalbull Any routine can have sub-routinesbull Any sub-routine can access anything that is

defined in its containing scope or inside the sub-routine itselfndash ldquonon-localrdquo variables

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 77: Compilation   0368-3133 (Semester A, 2013/14)

77

program p() int x procedure a() int y procedure b() hellip c() hellip

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

Example Nested Procedures

Possible call sequencep a a c b c d

what are the addresses of variables ldquoxrdquo ldquoyrdquo and

ldquozrdquo in procedure d

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 78: Compilation   0368-3133 (Semester A, 2013/14)

78

Nested Proceduresbull can call a sibling ancestorbull when ldquocrdquo uses (non-local)

variables from ldquoardquo which instance of ldquoardquo is it

bull how do you find the right activation record at runtime a

b

P

c c

d

a

Possible call sequencep a a c b c d

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 79: Compilation   0368-3133 (Semester A, 2013/14)

79

Nested Proceduresbull goal find the closest routine in

the stack from a given nesting level

bull if we reached the same routine in a sequence of callsndash routine of level k uses variables of

the same nesting level it uses its own variables

ndash if it uses variables of nesting level j lt k then it must be the last routine called at level j

bull If a procedure is last at level j on the stack then it must be ancestor of the current routine

Possible call sequencep a a c b c d

a

b

P

c c

d

a

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 80: Compilation   0368-3133 (Semester A, 2013/14)

80

Nested Proceduresbull problem a routine may need to access variables of

another routine that contains it staticallybull solution lexical pointer (aka access link) in the

activation recordbull lexical pointer points to the last activation record of the

nesting level above itndash in our example lexical pointer of d points to activation records

of cbull lexical pointers created at runtimebull number of links to be traversed is known at compile time

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 81: Compilation   0368-3133 (Semester A, 2013/14)

81

Lexical Pointers

a

a

c

b

c

d

y

y

z

z

Possible call sequencep a a c b c d

a

b

P

c c

d

a

program p() int x procedure a() int y procedure b() c()

procedure c() int z procedure d()

y = x + z

hellip b() hellip d() hellip hellip a() hellip c() hellip a()

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 82: Compilation   0368-3133 (Semester A, 2013/14)

82

What is a Compiler

>

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 83: Compilation   0368-3133 (Semester A, 2013/14)

83

Procedure Q Calls R

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 84: Compilation   0368-3133 (Semester A, 2013/14)

84

Activation Records Remarks

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 85: Compilation   0368-3133 (Semester A, 2013/14)

85

Non-Local goto in C syntax

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 86: Compilation   0368-3133 (Semester A, 2013/14)

86

Non-local gotos in C

bull setjmp remembers the current location and the stack frame

bull longjmp jumps to the current location (popping many activation records)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 87: Compilation   0368-3133 (Semester A, 2013/14)

87

Non-Local Transfer of Control in C

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 88: Compilation   0368-3133 (Semester A, 2013/14)

88

Stack Framesbull Allocate a separate space for every procedure incarnationbull Relative addressesbull Provide a simple mean to achieve modularitybull Supports separate code generation of proceduresbull Naturally supports recursionbull Efficient memory allocation policy

ndash Low overheadndash Hardware support may be available

bull LIFO policybull Not a pure stack

ndash Non local referencesndash Updated using arithmetic

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 89: Compilation   0368-3133 (Semester A, 2013/14)

89

The Frame Pointerbull The caller

ndash the calling routinebull The callee

ndash the called routinebull caller responsibilities

ndash Calculate arguments and save in the stackndash Store lexical pointer

bull call instruction M[--SP] = RA PC = callee

bull callee responsibilitiesndash FP = SPndash SP = SP - frame-size

bull Why use both SP and FP

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 90: Compilation   0368-3133 (Semester A, 2013/14)

90

Variable Length Frame Sizebull C allows allocating objects of unbounded

size in the stack void p() int i char p scanf(ldquodrdquo ampi) p = (char ) alloca(isizeof(int))

bull Some versions of Pascal allows conformant array value parameters

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 91: Compilation   0368-3133 (Semester A, 2013/14)

91

Limitations

bull The compiler may be forced to store a value on a stack instead of registers

bull The stack may not suffice to handle some language features

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 92: Compilation   0368-3133 (Semester A, 2013/14)

92

Frame-Resident Variablesbull A variable x cannot be stored in register when

ndash x is passed by referencendash Address of x is taken (ampx)ndash is addressed via pointer arithmetic on the stack-frame

(C varags)ndash x is accessed from a nested procedurendash The value is too big to fit into a single registerndash The variable is an arrayndash The register of x is needed for other purposesndash Too many local variables

bull An escape variablendash Passed by referencendash Address is takenndash Addressed via pointer arithmetic on the stack-framendash Accessed from a nested procedure

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 93: Compilation   0368-3133 (Semester A, 2013/14)

93

The Frames in Different Architectures

Pentium MIPS Sparc

InFrame(8) InFrame(0) InFrame(68)

InFrame(12) InReg(X157) InReg(X157)

InFrame(16) InReg(X158) InReg(X158)

M[sp+0]fpfp spsp sp-K

sp sp-KM[sp+K+0] r2

X157 r4X158 r5

save sp -K spM[fp+68]i0

X157i1

X158i2

g(x y z) where x escapes

x

y

z

View

Change

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 94: Compilation   0368-3133 (Semester A, 2013/14)

94

Limitations of Stack Framesbull A local variable of P cannot be stored in the activation record of P if its

duration exceeds the duration of Pbull Example 1 Static variables in C

(own variables in Algol)void p(int x) static int y = 6 y += x

bull Example 2 Features of the C languageint f() int x return ampx

bull Example 3 Dynamic allocationint f() return (int ) malloc(sizeof(int))

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 95: Compilation   0368-3133 (Semester A, 2013/14)

95

Compiler Implementation

bull Hide machine dependent partsbull Hide language dependent partbull Use special modules

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 96: Compilation   0368-3133 (Semester A, 2013/14)

96

Basic Compiler Phases

Source program (string)

EXE

lexical analysis

syntax analysis

semantic analysis

Code generation

AssemblerLinker

Tokens

Abstract syntax tree

Assembly

Frame managerControl Flow Graph

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 97: Compilation   0368-3133 (Semester A, 2013/14)

97

Hidden in the frame ADT

bull Word sizebull The location of the formalsbull Frame resident variablesbull Machine instructions to implement ldquoshift-

of-viewrdquo (prologueepilogue)bull The number of locals ldquoallocatedrdquo so farbull The label in which the machine code starts

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 98: Compilation   0368-3133 (Semester A, 2013/14)

98

Invocations to Frame

bull ldquoAllocaterdquo a new frame bull ldquoAllocaterdquo new local variablebull Return the L-value of local variablebull Generate code for procedure invocationbull Generate prologueepiloguebull Generate code for procedure return

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 99: Compilation   0368-3133 (Semester A, 2013/14)

99

Activation Records Summary

bull compile time memory management for procedure data

bull works well for data with well-scoped lifetimendash deallocation when procedure returns

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End
Page 100: Compilation   0368-3133 (Semester A, 2013/14)

100

The End

  • Compilation 0368-3133 (Semester A 201314)
  • What is a Compiler
  • Conceptual Structure of a Compiler
  • From scanning to parsing
  • Context Analysis
  • Code Generation
  • Code Generation IR
  • Slide 8
  • Three-Address Code IR
  • Abstract Register Machine
  • Abstract Activation Record Stack
  • Abstract Stack Frame
  • TAC generation for expressions
  • cgen example
  • Naiumlve cgen for expressions
  • TAC Generation for Control Flow Statements
  • Control-flow example ndash conditions
  • cgen for if-then-else
  • Naive cgen for expressions
  • Improving cgen for expressions
  • Sethi-Ullman translation
  • Weighted register allocation
  • Weighted reg alloc example
  • Weighted reg alloc example (2)
  • TAC Generation for Memory Access Instructions
  • Array operations
  • TAC Generation for Control Flow Statements (2)
  • Control-flow example ndash conditions (2)
  • cgen for if-then-else (2)
  • Control-flow example ndash loops
  • cgen for while loops
  • Optimization points
  • Interprocedural IR
  • Supporting Procedures
  • Abstract Register Machine (High Level View)
  • Abstract Register Machine (High Level View) (2)
  • Abstract Activation Record Stack (2)
  • Abstract Stack Frame (2)
  • Handling Procedures
  • Abstract Register Machine (2)
  • Semi-Abstract Register Machine
  • Intro Functions Example
  • What Can We Do with Procedures
  • Design Decisions
  • Static (lexical) Scoping
  • Dynamic Scoping
  • Example
  • Why do we care
  • Variable addresses for static scoping first attempt
  • Variable addresses for static scoping first attempt (2)
  • Compile-Time Information on Variables
  • Activation Record (Stack Frames)
  • Semi-Abstract Register Machine (2)
  • A Logical Stack Frame (Simplified)
  • Runtime Stack
  • Activation Record (frame)
  • Runtime Stack (2)
  • Code Blocks
  • L-Values of Local Variables
  • Pentium Runtime Stack
  • Accessing Stack Variables
  • Factorial ndash fact(int n)
  • Call Sequences
  • Call Sequences (2)
  • Call Sequences ndash Foo(4221)
  • ldquoTo Callee-save or to Caller-saverdquo
  • Caller-Save and Callee-Save Registers
  • Callee-Save Registers
  • Caller-Save Registers
  • Parameter Passing
  • Windows Exploit(s) Buffer Overflow
  • Buffer overflow
  • Buffer overflow (2)
  • Buffer overflow (3)
  • Buffer overflow (4)
  • Nested Procedures
  • Example Nested Procedures
  • Nested Procedures (2)
  • Nested Procedures (3)
  • Nested Procedures (4)
  • Lexical Pointers
  • What is a Compiler (2)
  • Procedure Q Calls R
  • Activation Records Remarks
  • Non-Local goto in C syntax
  • Non-local gotos in C
  • Non-Local Transfer of Control in C
  • Stack Frames
  • The Frame Pointer
  • Variable Length Frame Size
  • Limitations
  • Frame-Resident Variables
  • The Frames in Different Architectures
  • Limitations of Stack Frames
  • Compiler Implementation
  • Basic Compiler Phases
  • Hidden in the frame ADT
  • Invocations to Frame
  • Activation Records Summary
  • The End