languages and compiler design ii runtime...

26
PSU CS322 HM 1 Languages and Compiler Design II Runtime System Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 4/27/2010

Upload: halien

Post on 26-Aug-2018

241 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 1

Languages and Compiler Design IIRuntime System

Material provided by Prof. Jingke LiStolen with pride and modified by Herb Mayer

PSU Spring 2010rev.: 4/27/2010

Page 2: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 2

Agenda

• Runtime Storage Organization• Static Storage• Runtime Stack• System Heap• Functions and Activations• Activation Records• Function Call• Register Saving• Scopes• Function Parameters

Page 3: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 3

Runtime Storage Organization

Multiple memory uses on computer:• OS memory needs; e.g. ½ for Windows• Program code• User program data• Function invocations• Temporaries• I/O buffers• Etc.

Different requirements, caused by differences in: lifetime,size, access rights. Result: static space, stack, and heap

Stack

Heap

Static Data

Code

Reserved Space

Page 4: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 4

Static Storage

Space for static data objects is allocated in a fixed location for the whole lifetime of a program

• Possible when the sizes of the objects is known at compile-time• Static objects can be bound to absolute addresses; not necessarily

desirable• Static allocation requires no runtime management, hence simple to

handle• Space is wasted if objects are not needed for complete program

lifetime• Mostly used for global variables, code, and constants• Fortran and Cobol are designed to use only static storage• Such ancient languages need no support for recursive functions, nor

do they allow dynamic arrays

Page 5: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 5

Runtime Stack

Stack needed for data that are pushed and popped dynamically, following a last-in, first-out pattern

• Space needed at moment of function call, freed at moment of return

• Allocation and de-allocation can be implemented cheaply, by adjusting stack pointer; though “old”data remain in memory

• More efficient use of space than static allocation• Most newer imperative languages use stack

storage for data associated with activations; became popular with Algol60

Page 6: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 6

System Heap

Space for heap data objects can be allocated and freed any time during program execution. Is most flexible, expensive, and dangerous method of storage allocation (memory leaks). Typical heap operations are:

• Allocation: Acquire free storage for program. Typically triggered by explicit or implicit user commands, e.g.(C) struct node *root = (struct node *)

malloc( sizeof( struct node ) );(Java) TreeNode root = new TreeNode( val );

• De-allocation: Reclaims (AKA frees) no-longer-needed storage for reuse• Languages such as C and Pascal contain commands for storage reclaiming

– e.g. free( root )• Compaction: Construct larger blocks of free storage from smaller pieces

– Can be triggered by a failed allocation request; AKA garbage collection• Lisp, ML and some interpreted languages need heap for activation records

Page 7: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 7

Functions and Activations• Functions, procedures (methods), and classes constitute a form of

programming abstraction– focus here functions, not classes

• Allow program to be divided into named components with hidden internals– returning a result to place on invocation

• Permits code re-use• Each function invocation at runtime is called an activation• Each activation has its own data: formals => actuals, and locals• Storage for these data is called an Activation Record (AR). AKA

Stack Frame• Many activations for the same function can exist at one moment of

time, due to recursive calls• Data associated with one activation are independent from all others• Normally, an activation record is created when a function is invoked

and is destroyed when the function returns

Page 8: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 8

Activation RecordsActivation record typically contain the following entries:• Return address: the address of instruction after the call• Formal parameters: sequence of parameters passed to function by

caller– At call: actuals. Inside function: formals. Actuals are bound to formals

• Return value: a place for storing the function return value• Local data: storage for local variables• Access link: AKA static link, a pointer to next activation record in chain

for accessing non-local data– e.g. lexically enclosing function’s AR, as in Algol, Ada, Pascal, PLI

• Control link: AKA dynamic link, pointer to caller’s activation record• Saved machine status: holds info about the machine (i.e. registers’

values) just before the function is called• Temporaries: storage for compiler-allocated temporary objects (e.g.

dynamic arrays)

Page 9: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 9

Where Are Activation Records Stored?Static Allocation: number of ARs and the size of each AR must be

known at compile time• No runtime management needed• Multiple invocations of the same function reuse the same AR• Can’t handle recursive functions and dynamic data• Only early Fortran uses this approach

Stack Allocation: ARs are pushed on and popped off the stack• Works for block-structured languages: a function must return before

its own caller returns• Very efficient: hence default choice of most programming languages• Can’t handle “first-class” functions

Heap Allocation: AR can be created and destroyed at any time• Needed for implementing functional languages• High overhead

Page 10: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 10

Stack-Based AR Allocation

In most languages, iffunction f is declaredinside function g, then fcan only be invokedwithin the scope of g.

This nesting property offunction calls makes itpossible to allocate ARson a stack. Guaranteesthat non-local variablesexist when needed.

Stack implementation isvery efficient.

Stack Growth

Return Value

actual 1

actual N

control link

access link

save reg 1

save reg N

local 1

local N

temp 1

temp Nsp

bp

Return Value

actual 1

actual N

control link

access link

locals

save reg 1

Page 11: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 11

Function CallWhen a Method is activated = Function is Called:

The Caller:• allocates [part of] an activation record for the callee• evaluates the actual parameters, and stores them into AR• stores a return address [or return slot] into the AR• if needed, saves (some) register values into the AR• stores current AR pointer (AKA bp, for base pointer; or bp for base

pointer) and updates it to point to callee’s AR– But which place in the AR?

• transfers control to the calleeThe Callee:• saves (some) register values and other machine status info• allocates and initializes its local data and begins execution• Allocates temps, if needed

Page 12: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 12

Function Call, Cont’d

Upon ReturningThe Callee:• places return value at place the caller can access• restores caller’s AR pointer and other registers, using

saved info in stack marker• returns control to the caller

The Caller• can copy the returned value into its own AR• On some architectures: frees space for actuals

Page 13: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 13

Register Saving

Live registers’ content must be saved in memory before they can be used for new purpose in callee. The register-saving task can be done by the caller alone, by the callee alone, or split between the two

• Caller Saving: The caller needs to save the registers that hold live data, regardless whether the callee is actually going to use any of these registers. May end being unnecessary work

• Callee Saving: The callee needs to save the registers that it’s going to use, regardless whether they contain any live contents. It may also end up doing unnecessary work

• Split Saving: Designate a set of registers as caller-save registers, and the rest callee-save registers. The caller may use any callee-save register without saving; while the callee may use any caller-save register without saving

Page 14: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 14

ScopesDef: Scope is a region of program text over which a name is known; e.g. varbinding is effective. Scopes are typically introduced by function declarationsas well as program blocks, like { … } blocks in C++, Java

main() { //B0int a = 0, b = 0;{ //B1

int b = 1;{ //B2

int a = 2;printf("%d %d\n", a, b);

} //end B2{ //B3

int b = 3;printf("%d %d\n", a, b);

} //end B3printf("%d %d\n", a, b);

} //end B1printf("%d %d\n", a, b);

} //end B0

B3int b = 3

B2int a = 2

B1, B2int b = 1

B0int b = 0

B0, B1, B3int a = 0

Scope Namedeclaration

Page 15: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 15

Lexical Scope RuleUnder lexical scope rules, variables are identified by looking backwardsthrough the program text to find the nearest enclosing declaration. earlyall programming languages use lexical scope

For the program on the right,when f is executed, it needs tolook up a value for a, which is afree variable of f. The nearestenclosing declaration in this caseis the global declaration.At the time f executes, the globala has the value 5, so f returns5+10, and 15 is printed by theprogram.

program main;var a : int := 0;function f( b : int) : int is

return a + b;end f;function g( c : int ) : int is

var a : int := 1;a := a + 2;return f( c );

end g;begin

a := 5;print( g( 10 ) );

end main;

Page 16: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 16

Nested ScopesEnvironments Associated with a Function:• Definition Environment: the environment in which

the function is defined. Needed if lexical scope is used

• Invocation Environment: the environment in which the function is invoked. Needed if dynamic scope is used

• Passing Environment: the environment in which the function is passed as a parameter. No direct use

Page 17: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 17

Needed EnvironmentSet up an access link in AR to point to the AR of function’sdef-env or invoc-env: either another AR on stack or theglobal env:

• For static-scoped languages: The access link should be pointing the function’s def-env, which can be derived from the caller’s access link (see next slide). In the case of a nest of scopes, a chain of access links can be followed to access to every enclosing environment of an inner function

• For dynamic-scoped languages: The access link should be pointing to the function’s invoc-env, which is simply the caller’s AR(!). Since the control link is already pointing to caller’s AR, there is no need to set up a separate access link

Page 18: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 18

Setting Up Access LinksAssume f calls g, and f and g are defined at scope-levels m and n, respectively. Further assume that f ’s access link is already set up:

• If m > n — For g to be visible to f , g’s definition environment must be one of the scopes that encloses f . Traverse f ’s access links, the AR at scope-level n − 1 should be the target for g’s access link

• If m = n — f and g are defined in the scope Simply use f ’s access link as g’s access link

• If m < n — f must be the definition environment of g. Let g’s access link points to f ’s AR

Page 19: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 19

Sample: Scopeprogram main;| function count( i : integer; a: Intlist): integer;| | var sum: integer := 0;| | procedure check_int( j : integer );| | | begin –- check_int| | | if j = i then sum := sum + 1; end if;| | | end check_int;| | procedure do_intlist( a: Intlist );| | | begin -– do_intlist| | | while (a) loop| | | check_int(a^.x); a := a^.next;| | | end loop;| | | end do_intlist;| | begin –- count| | | do_intlist( a ); count := sum;| | end count;| procedure print_int( i: integer);| | begin –- print_int| | | writeln( i );| | end print_int;| begin -- main| var a: Intlist;| print_int( count( 1, a ) );end main;

Page 20: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 20

Execution Scenario

main calls countcount calls do_intlistdo_intlist calls check_int· · ·main calls print_int(A snapshot of ARs on the stack isshown on the right.)When check_int is passed as aparameter to do_intlist, its accesslink can be computed, since it isdefined in this scope.

Page 21: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 21

Function Parameters – e.g. Pascalprogram main;

procedure do_intlist(a: Intlist; procedure f(i: integer));begin ... f(a^x); ... end;

function count(i: integer; a: Intlist): integer;var sum: integer := 0;procedure check_int(j: integer);

begin if j = i then sum := sum + 1; end;begin do_intlist(a, check_int); count := sum; end;

beginvar a: Intlist;print_int(count(2,a));

end.

Here check_int is passed as a parameter to do_intlist, and gets invoked there;it references two non-local variables i and sum, which are not global variableseither. Cannot be directly expressed in C or C++

Page 22: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 22

Functions as Parameters• The call-callee relationship discussed previously does not

hold for languages with nested procedural scopes, like Pascal, Algol, Ada

• check_int’s definition environment may have nothing to do with do_intlist. How can we set up the access link for check_int’s AR in this case?

• Solution: The routine that passes f as a parameter to g has information about f ’s definition and can set up access link for f . And it can pass f ’s access link together with f .

• Effectively, when passing a function as a parameter, we should pass a closure (function pointer plus its environment) instead of just a function pointer.

Page 23: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 23

Passing Global Functions• Global functions have a unique feature — the definition

environment is the global scope. There is no need to set up an access link for a global function’s invocation, since any non-local variable to these functions must be a global variable, which can be accessed directly.

• Example: In C, all functions are defined at global scope, hence there is no need to use closure to handle function parameters.

• Side Note: gcc extends C with nested function definitions, but it does use closures to handle function parameters — result’s correctness is not guaranteed(!).

Page 24: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 24

Functions as Return Values• Going one step further, suppose that function values are treated like other

values, e.g., they can be returned as function results or stored into variables (the following example is in ML):

type counter = int list -> intfun make_counter( i : int ) : counter =

let fun count( a: int list) =let val sum = ref 0

fun check_int( j : int ) =

if j = i then sum := !sum + 1 else ()in do_intlist( a, check_int ); !sum end

in count endval g: counter = make_counter(2);

val c: int list = ...;val c2 : int = g(c);

• A scenario: main calls make_counter, which returns count; main calls count; count calls do_intlist; do_intlist calls check_int;

Page 25: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 25

Functions as Return Values• The scenario: main calls make_counter, which returns count; main

calls count; count calls do_intlist; do_intlist calls check_int;• Problems: check_int requires value of non-local variable i , which is

the parameter to make_counter, but activation of make counter is no longer live when check_int is called!

• If i is stored in activation record for make_counter and activation-record is stack-allocated, it will be gone at the point where check int needs it!

• Solution: Store activation records in the heap• Special Case: Again, if a global function is returned as a return

value, there is no problem for executing it later, since all its non-local variables are global variables

Page 26: Languages and Compiler Design II Runtime Systemweb.cecs.pdx.edu/~herb/cs322s10/cs322_11_Runtime_System.pdf · Languages and Compiler Design II Runtime System ... location for the

PSU CS322 HM 26

Handling Program BlocksNested program blocks can have their own local variables. E.g.

if (i>j) { int x; ... }else { double y[100]; ... }

Where should these variables be stored?• Solution 1: Consider a block as an in-line function without

parameters, and create an AR for it. Advantages: efficient use of storage. Downside: high runtime overhead

• Solution 2: Use ARs only for true functions. If there are blocks within a function, statically collect storage requirement information from each block; then compute the maximum amount of storage needed for handling all blocks, and allocate that in the function’s AR. Advantages: no runtime overhead. Downside: may waste storage space