intermediate representations cs 671 february 12, 2008

Intermediate Representations

CS 671February 12, 2008

2 CS 671 – Spring 2008

Last Time

The compiler must generate code to handle issues that arise at run time• Representation of various data types• Procedure linkage• Storage organization

3 CS 671 – Spring 2008

Activation Records

A procedure is a control abstraction• it associates a name with a chunk of code• that piece of code is regarded in terms of its

purpose and not of its implementation

A procedure creates its own name space• It can declare local variables• Local declarations may hide non-local ones• Local names cannot be seen from outside

4 CS 671 – Spring 2008

Handling Control Abstractions

Generated code must be able to • preserve current state

– save variables that cannot be saved in registers– save specific register values

• establish procedure environment on entry– map actual to formal parameters– create storage for locals

• restore previous state on exit

This can be modeled with a stack• Allocate a memory block for each activation• Maintain a stack of such blocks• This mechanism can handle recursion

5 CS 671 – Spring 2008

Activation Records

Keeps track of the bottom of the current activation record

g(…)

{

f(a1,…,an);

}

•g is the caller

•f is the callee

…Arg n

…Arg 2Arg 1

Ret Address

Local varsTemporariesSaved regs

Arg m…

Arg 1Ret Address

f’s frame

g’s frame

next frame

sp

fp

6 CS 671 – Spring 2008

Activation Records

•Handling nested procedures– We must keep a static link (vs. dynamic link)

•Registers vs. memory– Assume infinite registers (for now)

g(…) {

f(a1,…,an);

}

If g saves before call, restores after call caller-save register

If f saves before using a reg, restores after callee-save register

7 CS 671 – Spring 2008

Intermediate Code Generation

Source code

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Intermediate Code Gen

lexical errors

syntax errors

semantic errors

tokens

AST

AST’

IR

8 CS 671 – Spring 2008

And Then? The Back End

IR

Trace Formation

Instruction Selection

Register Allocation

Optimizations ** phase ordering problem

9 CS 671 – Spring 2008

IR Motivation

What we have so far...• An abstract syntax tree

– With all the program information– Known to be correct

• Well-typed• Nothing missing• No ambiguities

What we need...• Something “Executable”• Closer to actual machine level of abstraction

10 CS 671 – Spring 2008

Why an Intermediate Representation?

• What is the IR used for?– Portability– Optimization– Component interface– Program understanding

• Compiler– Front end does lexical analysis, parsing, semantic

analysis, translation to IR– Back end does optimization of IR, translation to

machine instructions

• Try to keep machine dependences out of IR for as long as possible

11 CS 671 – Spring 2008

Intermediate Code

• Abstract machine code – (Intermediate Representation)

• Allows machine-independent code generation, optimization

AST IR PowerPC

Alpha

x86optimize

12 CS 671 – Spring 2008

Intermediate Representations

• Any representation between the AST and ASM– 3 address code: triples, quads (low-level)– Expression trees (high-level)

a[i];

MEM

MEM

+

a

BINOP

MUL i CONST

W

13 CS 671 – Spring 2008

What Makes a Good IR?

• Easy to translate from AST

• Easy to translate to assembly

• Narrow interface: small number of node types (instructions)

– Easy to optimize– Easy to retarget AST (>40 node types)

IR (~15 node types)

x86 (>200 opcodes)

14 CS 671 – Spring 2008

IR selection

Using an existing IR• cost savings due to reuse• it must be expressive and appropriate for the

compiler operations

Designing an IR• decide how close to machine code it should be• decide how expressive it should be• decide its structure• consider combining different kinds of IRs

15 CS 671 – Spring 2008

Optimizing Compilers

• Goal: get program closer to machine code without losing information needed to do useful optimizations

• Need multiple IR stages

MIR PowerPC (LIR)

Alpha (LIR)

x86 (LIR)optimize

AST HIR

optimize

opt

opt

opt

16 CS 671 – Spring 2008

High-Level IR (HIR)

•used early in the process•usually converted to lower form later on

• Preserves high-level language constructs– structured flow, variables, methods

• Allows high-level optimizations based on properties of source language (e.g. inlining, reuse of constant variables)•Example: AST

17 CS 671 – Spring 2008

Medium-Level IR (MIR)

• Try to reflect the range of features in the source language in a language-independent way

• Intermediate between AST and assembly

• Unstructured jumps, registers, memory locations

• Convenient for translation to high-quality machine code

• OtherMIRs:– tree IR: easy to generate, easy to do reasonable

instruction selection– quadruples: a = b OP c (easy to optimize)– stack machine based (like Java bytecode)

18 CS 671 – Spring 2008

Low-Level IR (LIR)

• Assembly code + extra pseudo instructions

• Machine dependent

• Translation to assembly code is trivial

• Allows optimization of code for low-level considerations: scheduling, memory layout

19 CS 671 – Spring 2008

IR Classification: Level

for i := op1 to op2 step op3

instructions

endfor

i := op1

if step < 0 goto L2

L1: if i > op2 goto L3

instructions

i := i + step

goto L1

L2: if i < op2 goto L3

instructions

i := i + step

goto L2

L3:

High-level

Medium-level

20 CS 671 – Spring 2008

IR Classification: Structure

Graphical• Trees, graphs• Not easy to rearrange• Large structures

Linear• Looks like pseudocode• Easy to rearrange

Hybrid• Combine graphical and linear IRs• Example:

– low-level linear IR for basic blocks, and– graph to represent flow of control

21 CS 671 – Spring 2008

Graphical IRs

Abstract syntax tree• High-level• Useful for source-level information• Retains syntactic structure• Common uses

– source-to-source translation– semantic analysis– syntax-directed editors

22 CS 671 – Spring 2008

Graphical IRs

Tree, for basic block*• root: operator• up to two children: operands• can be combined

Uses:• algebraic simplifications• may generate locally optimal code.

L1: i := 2

t1:= i+1

t2 := t1>0

if t2 goto L1

assgn, i add, t1 gt, t2

2 i 1 t1 0

2

assgn, i 1

add, t1 0

gt, t2

*straight-line code with no branches or branch targets.

23 CS 671 – Spring 2008

Graphical IRs

Directed Acyclic Graphs (DAGs)• Like compressed trees

– leaves: variables, constants available on entry– internal nodes: operators

• annotated with variable names– distinct left/right children

• Used for basic blocks (DAGs don't show control flow)• Can generate efficient code

– Note: DAGs encode common expressions• But difficult to transform • Good for analysis

24 CS 671 – Spring 2008

Graphical IRs

Control-Flow Graphs (CFGs)• Each node corresponds to a

– basic block, or – part of a basic block, or

• may need to determine facts at specific points within BB

– a single statement• more space and time

• Each edge represents flow of control

25 CS 671 – Spring 2008

Control-Flow Graphs

Graphical representation of runtime control-flow paths• Nodes of graph: basic blocks (straight-line computations)• Edges of graph: flows of controlUseful for collecting information about computation• Detect loops, remove redundant computations, register

allocation, instruction scheduling…Alternative CFG: Each node contains a single stmt

…… i = 0 while (i < 50) { t1 = b * 2; a = a + t1; i = i + 1; }….

if I < 50

…… t1 := b * 2; a := a + t1; i = i + 1;

i =0;

26 CS 671 – Spring 2008

Graphical IRs: Often Use Basic Blocks

Basic block = a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end

Partitioning a sequence of statements into BBs1.Determine leaders (first statements of BBs)

– The first statement is a leader– The target of a conditional is a leader– A statement following a branch is a leader

2.For each leader, its basic block consists of the leader and all the statements up to but not including the next leader.

27 CS 671 – Spring 2008

Basic Blocks

unsigned int fibonacci (unsigned int n) {unsigned int f0, f1, f2;f0 = 0;f1 = 1;if (n <= 1)

return n;for (int i=2; i<=n; i++) {

f2 = f0+f1;f0 = f1;f1 = f2;

}return f2;

}

read(n)f0 := 0f1 := 1if n<=1 goto

L0i := 2

L2:if i<=n goto L1

return f2L1:f2 := f0+f1

f0 := f1f1 := f2i := i+1go to L2

L0:return n

28 CS 671 – Spring 2008

Basic Blocks

read(n)f0 := 0f1 := 1if n<=1 goto L0i := 2

L2:if i<=n goto L1return f2

L1:f2 := f0+f1f0 := f1f1 := f2i := i+1go to L2

L0:return n

Leaders:

read(n)f0 := 0f1 := 1n <= 1

i := 2

i<=n

return f2f2 := f0+f1f0 := f1f1 := f2i := i+1

return n

exit

entry

29 CS 671 – Spring 2008

Graphical IRs

Dependence graphs : they represents constraints on the sequencing of operations

• Dependence: a relation between two statements that puts a constraint on their execution order– Control dependence: Based on the program's

control flow– Data dependence: Based on the flow of data

between statements• Nodes represent statements • Edges represent dependences

Built for specific optimizations, then discarded

30 CS 671 – Spring 2008

Graphical IRs

Dependence graphs

Example:

s1 a := b + cs2 if a>10 goto L1s3 d := b * es4 e := d + 1s5 L1: d := e / 2

control dependence: s3 and s4 are executed only when a<=10

true or flow dependence: s2 uses a value defined in s1This is read-after-write dependence

antidependence: s4 defines a value used in s3This is write-after-read dependence

output dependence: s5 defines a value also defined in s3This is write-after-write dependence

input dependence: s5 uses a value also uses in s3This is read-after-read situation. It places no constraintsin the execution order, but is used in some optimizations.

s1 s2 s3 s4 s5

31 CS 671 – Spring 2008

IR Classification: Structure

Graphical• Trees, graphs• Not easy to rearrange• Large structures

Linear• Seq of instructions that execute in order of appearance

• Control flow is represented by cond branches and jumps

Hybrid• Combine graphical and linear IRs• Example:

– low-level linear IR for basic blocks, and– graph to represent flow of control

32 CS 671 – Spring 2008

Linear IR

Lower level IR before final code generation• A linear sequence of instructions • Resemble assembly code for an abstract machine

– Explicit conditional branches and goto jumps• Reflect instruction sets of the target machine

– Stack-machine code and three-address code• Implemented as a collection (table or list) of tuples

Push 2Push yMultiply Push xsubtract

Linear IR for x – 2 * y

MOV 2 => t1 MOV y => t2MULT t2 => t1 MOV x => t4 SUB t1 => t4

Stack-machine code two-address code three-address code

t1 := 2 t2 := yt3 := t1*t2t4 := x t5 := t4-t3

33 CS 671 – Spring 2008

Linear IRs

Stack machine code• Assumes presence of operand stack• Useful for stack architectures, JVM• Operations typically pop operands and push results• Advantages

– Easy code generation– Compact form

• Disadvantages– Difficult to rearrange– Difficult to reuse expressions

34 CS 671 – Spring 2008

Stack-Machine Code

Also called one-address code• Assumes an operand stack• Operations take operands from the stack and push

results back onto the stack

Compact, simple to generate and execute– Most operands do not need names– Results are transitory unless explicitly moved to

memory

Used as IR for Smalltalk and Java

Push 2Push yMultiply Push xsubtract

Stack-machine code for x – 2 * y

35 CS 671 – Spring 2008

Linear IR: Three-Address Code

Each instruction is of the form

x := y op z• y and z can be only registers or constants• Just like assembly

Common form of intermediate code

The AST expression x + y * z is translated as t1 := y * z

t2 := x + t1

• Each subexpression has a “home”

36 CS 671 – Spring 2008

Three-Address Code

• Result, operand, operand, operator– x := y op z, where op is a binary operator and x, y,

z can be variables, constants or compiler generated temporaries (intermediate results)

• Can write this as a shorthand– <op, arg1, arg2, result > -- quadruples

• Statements– Assignment x := y op z– Copy stmts x := y– Goto L

37 CS 671 – Spring 2008

Three Address Code

• Statements (cont)– if x relop y goto L– Indexed assignments x := y[j] or s[j] := z– Address and pointer assignments (for C)

• x := &y• x := *p• *x := y

– Parm x; – call p, n; – return y;

38 CS 671 – Spring 2008

Three-Address Code

Every instruction manipulates at most two operands and one result. • Arithmetic operations: x := y op z | x := op y • Data movement: x := y [z] | x[z] := y | x := y • Control flow: if y op z goto x | goto x• Function call: param x | return y | call foo

Reasonably compact; reuse of names and values

t1 := 2 t2 := yt3 := t1*t2t4 := x t5 := t4-t3

Three-address code for x – 2 * y

39 CS 671 – Spring 2008

Storing Three-Address Code

op arg1 arg2 result

(0) Uminus c t1

(1) Mult b t1 t2

(2) Uminus c t3

(3) Mult b t3 t4

(4) Plus t2 t4 t5

(5) Assign t5 a

t1 := - ct2 := b * t1t3 := -ct4 := b * t3t5 := t2 + t4a := t5

3AC

Store all instructions in a quadruple table • Every instr has four fields: op, arg1, arg2, result• Label of instructions index of instruction in table

Quadruple entries

40 CS 671 – Spring 2008

Bigger Example

Consider the AST

t1 := - ct2 := b * t1

t3 := - ct4 := b * t3

t5 := t2 + t4

a := t5

t1 := - ct2 := b * t1

t3 := - ct4 := b * t3

t5 := t2 + t4

a := t5

41 CS 671 – Spring 2008

The IR Machine

A machine with• Infinite number of temporaries (think registers)• Simple instructions

– 3-operands– Branching– Calls with simple calling convention

• Simple code structure– Array of instructions

• Labels to define targets of branches

42 CS 671 – Spring 2008

Temporaries

The machine has an infinite number of temporaries• Call them t0, t1, t2, .... • Temporaries can hold values of any type• The type of the temporary is derived from the

generation• Temporaries go out of scope with each function

43 CS 671 – Spring 2008

Mapping Names to Variables

Variables are names for values• Names given by programmers in the input program• Names given by compilers for storing intermediate

results of computation

Reusing temp variables can save space, but mask context and prevent analysis and optimizations• Result of t1*t2 is no longer available after t1 is reused

t1 := 2 t2 := yt3 := t1*t2t4 := x t5 := t4-t3

Three-address code for x – 2 * y

t1 := 2 t2 := yt1 := t1*t2t2 := x t1 := t2-t1

Different values use distinct names

Different values reuse the same name

44 CS 671 – Spring 2008

Mapping Storage to Variables

Variables are placeholders for values• Every variable must have location to store its value

– Register, stack, heap, static storage• Values must be loaded into registers before use

t1 := 2 t2 := yt3 := t1*t2t4 := x t5 := t4-t3

Three-address code for x – 2 * y:

t1 := 2 t2 := t1*y t3 := x-t2

x and y are in registers x and y are in memory

Which vars can be kept in registers?Which vars must be stored in memory?

void A(int b, int *p) { int a, d; a = 3; d = foo(a); *p =b+d;}

45 CS 671 – Spring 2008

Summary

• IRs provide the interface between the front and back ends of the compiler

– Graphical, linear, hybrid

• Should be machine and language independent

• Should be amenable to optimization

intermediate representations cs 671 february 12, 2008

Documents