intermediate representations cs 671 february 12, 2008
TRANSCRIPT
Intermediate Representations
CS 671February 12, 2008
2 CS 671 – Spring 2008
Last Time
The compiler must generate code to handle issues that arise at run time• Representation of various data types• Procedure linkage• Storage organization
3 CS 671 – Spring 2008
Activation Records
A procedure is a control abstraction• it associates a name with a chunk of code• that piece of code is regarded in terms of its
purpose and not of its implementation
A procedure creates its own name space• It can declare local variables• Local declarations may hide non-local ones• Local names cannot be seen from outside
4 CS 671 – Spring 2008
Handling Control Abstractions
Generated code must be able to • preserve current state
– save variables that cannot be saved in registers– save specific register values
• establish procedure environment on entry– map actual to formal parameters– create storage for locals
• restore previous state on exit
This can be modeled with a stack• Allocate a memory block for each activation• Maintain a stack of such blocks• This mechanism can handle recursion
5 CS 671 – Spring 2008
Activation Records
Keeps track of the bottom of the current activation record
g(…)
{
f(a1,…,an);
}
•g is the caller
•f is the callee
…Arg n
…Arg 2Arg 1
Ret Address
Local varsTemporariesSaved regs
Arg m…
Arg 1Ret Address
f’s frame
g’s frame
next frame
sp
fp
6 CS 671 – Spring 2008
Activation Records
•Handling nested procedures– We must keep a static link (vs. dynamic link)
•Registers vs. memory– Assume infinite registers (for now)
g(…) {
f(a1,…,an);
}
If g saves before call, restores after call caller-save register
If f saves before using a reg, restores after callee-save register
7 CS 671 – Spring 2008
Intermediate Code Generation
Source code
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Intermediate Code Gen
lexical errors
syntax errors
semantic errors
tokens
AST
AST’
IR
8 CS 671 – Spring 2008
And Then? The Back End
IR
Trace Formation
Instruction Selection
Register Allocation
Optimizations ** phase ordering problem
9 CS 671 – Spring 2008
IR Motivation
What we have so far...• An abstract syntax tree
– With all the program information– Known to be correct
• Well-typed• Nothing missing• No ambiguities
What we need...• Something “Executable”• Closer to actual machine level of abstraction
10 CS 671 – Spring 2008
Why an Intermediate Representation?
• What is the IR used for?– Portability– Optimization– Component interface– Program understanding
• Compiler– Front end does lexical analysis, parsing, semantic
analysis, translation to IR– Back end does optimization of IR, translation to
machine instructions
• Try to keep machine dependences out of IR for as long as possible
11 CS 671 – Spring 2008
Intermediate Code
• Abstract machine code – (Intermediate Representation)
• Allows machine-independent code generation, optimization
AST IR PowerPC
Alpha
x86optimize
12 CS 671 – Spring 2008
Intermediate Representations
• Any representation between the AST and ASM– 3 address code: triples, quads (low-level)– Expression trees (high-level)
a[i];
MEM
MEM
+
a
BINOP
MUL i CONST
W
13 CS 671 – Spring 2008
What Makes a Good IR?
• Easy to translate from AST
• Easy to translate to assembly
• Narrow interface: small number of node types (instructions)
– Easy to optimize– Easy to retarget AST (>40 node types)
IR (~15 node types)
x86 (>200 opcodes)
14 CS 671 – Spring 2008
IR selection
Using an existing IR• cost savings due to reuse• it must be expressive and appropriate for the
compiler operations
Designing an IR• decide how close to machine code it should be• decide how expressive it should be• decide its structure• consider combining different kinds of IRs
15 CS 671 – Spring 2008
Optimizing Compilers
• Goal: get program closer to machine code without losing information needed to do useful optimizations
• Need multiple IR stages
MIR PowerPC (LIR)
Alpha (LIR)
x86 (LIR)optimize
AST HIR
optimize
opt
opt
opt
16 CS 671 – Spring 2008
High-Level IR (HIR)
•used early in the process•usually converted to lower form later on
• Preserves high-level language constructs– structured flow, variables, methods
• Allows high-level optimizations based on properties of source language (e.g. inlining, reuse of constant variables)•Example: AST
17 CS 671 – Spring 2008
Medium-Level IR (MIR)
• Try to reflect the range of features in the source language in a language-independent way
• Intermediate between AST and assembly
• Unstructured jumps, registers, memory locations
• Convenient for translation to high-quality machine code
• OtherMIRs:– tree IR: easy to generate, easy to do reasonable
instruction selection– quadruples: a = b OP c (easy to optimize)– stack machine based (like Java bytecode)
18 CS 671 – Spring 2008
Low-Level IR (LIR)
• Assembly code + extra pseudo instructions
• Machine dependent
• Translation to assembly code is trivial
• Allows optimization of code for low-level considerations: scheduling, memory layout
19 CS 671 – Spring 2008
IR Classification: Level
for i := op1 to op2 step op3
instructions
endfor
i := op1
if step < 0 goto L2
L1: if i > op2 goto L3
instructions
i := i + step
goto L1
L2: if i < op2 goto L3
instructions
i := i + step
goto L2
L3:
High-level
Medium-level
20 CS 671 – Spring 2008
IR Classification: Structure
Graphical• Trees, graphs• Not easy to rearrange• Large structures
Linear• Looks like pseudocode• Easy to rearrange
Hybrid• Combine graphical and linear IRs• Example:
– low-level linear IR for basic blocks, and– graph to represent flow of control
21 CS 671 – Spring 2008
Graphical IRs
Abstract syntax tree• High-level• Useful for source-level information• Retains syntactic structure• Common uses
– source-to-source translation– semantic analysis– syntax-directed editors
22 CS 671 – Spring 2008
Graphical IRs
Tree, for basic block*• root: operator• up to two children: operands• can be combined
Uses:• algebraic simplifications• may generate locally optimal code.
L1: i := 2
t1:= i+1
t2 := t1>0
if t2 goto L1
assgn, i add, t1 gt, t2
2 i 1 t1 0
2
assgn, i 1
add, t1 0
gt, t2
*straight-line code with no branches or branch targets.
23 CS 671 – Spring 2008
Graphical IRs
Directed Acyclic Graphs (DAGs)• Like compressed trees
– leaves: variables, constants available on entry– internal nodes: operators
• annotated with variable names– distinct left/right children
• Used for basic blocks (DAGs don't show control flow)• Can generate efficient code
– Note: DAGs encode common expressions• But difficult to transform • Good for analysis
24 CS 671 – Spring 2008
Graphical IRs
Control-Flow Graphs (CFGs)• Each node corresponds to a
– basic block, or – part of a basic block, or
• may need to determine facts at specific points within BB
– a single statement• more space and time
• Each edge represents flow of control
25 CS 671 – Spring 2008
Control-Flow Graphs
Graphical representation of runtime control-flow paths• Nodes of graph: basic blocks (straight-line computations)• Edges of graph: flows of controlUseful for collecting information about computation• Detect loops, remove redundant computations, register
allocation, instruction scheduling…Alternative CFG: Each node contains a single stmt
…… i = 0 while (i < 50) { t1 = b * 2; a = a + t1; i = i + 1; }….
if I < 50
…… t1 := b * 2; a := a + t1; i = i + 1;
i =0;
26 CS 671 – Spring 2008
Graphical IRs: Often Use Basic Blocks
Basic block = a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end
Partitioning a sequence of statements into BBs1.Determine leaders (first statements of BBs)
– The first statement is a leader– The target of a conditional is a leader– A statement following a branch is a leader
2.For each leader, its basic block consists of the leader and all the statements up to but not including the next leader.
27 CS 671 – Spring 2008
Basic Blocks
unsigned int fibonacci (unsigned int n) {unsigned int f0, f1, f2;f0 = 0;f1 = 1;if (n <= 1)
return n;for (int i=2; i<=n; i++) {
f2 = f0+f1;f0 = f1;f1 = f2;
}return f2;
}
read(n)f0 := 0f1 := 1if n<=1 goto
L0i := 2
L2:if i<=n goto L1
return f2L1:f2 := f0+f1
f0 := f1f1 := f2i := i+1go to L2
L0:return n
28 CS 671 – Spring 2008
Basic Blocks
read(n)f0 := 0f1 := 1if n<=1 goto L0i := 2
L2:if i<=n goto L1return f2
L1:f2 := f0+f1f0 := f1f1 := f2i := i+1go to L2
L0:return n
Leaders:
read(n)f0 := 0f1 := 1n <= 1
i := 2
i<=n
return f2f2 := f0+f1f0 := f1f1 := f2i := i+1
return n
exit
entry
29 CS 671 – Spring 2008
Graphical IRs
Dependence graphs : they represents constraints on the sequencing of operations
• Dependence: a relation between two statements that puts a constraint on their execution order– Control dependence: Based on the program's
control flow– Data dependence: Based on the flow of data
between statements• Nodes represent statements • Edges represent dependences
Built for specific optimizations, then discarded
30 CS 671 – Spring 2008
Graphical IRs
Dependence graphs
Example:
s1 a := b + cs2 if a>10 goto L1s3 d := b * es4 e := d + 1s5 L1: d := e / 2
control dependence: s3 and s4 are executed only when a<=10
true or flow dependence: s2 uses a value defined in s1This is read-after-write dependence
antidependence: s4 defines a value used in s3This is write-after-read dependence
output dependence: s5 defines a value also defined in s3This is write-after-write dependence
input dependence: s5 uses a value also uses in s3This is read-after-read situation. It places no constraintsin the execution order, but is used in some optimizations.
s1 s2 s3 s4 s5
31 CS 671 – Spring 2008
IR Classification: Structure
Graphical• Trees, graphs• Not easy to rearrange• Large structures
Linear• Seq of instructions that execute in order of appearance
• Control flow is represented by cond branches and jumps
Hybrid• Combine graphical and linear IRs• Example:
– low-level linear IR for basic blocks, and– graph to represent flow of control
32 CS 671 – Spring 2008
Linear IR
Lower level IR before final code generation• A linear sequence of instructions • Resemble assembly code for an abstract machine
– Explicit conditional branches and goto jumps• Reflect instruction sets of the target machine
– Stack-machine code and three-address code• Implemented as a collection (table or list) of tuples
Push 2Push yMultiply Push xsubtract
Linear IR for x – 2 * y
MOV 2 => t1 MOV y => t2MULT t2 => t1 MOV x => t4 SUB t1 => t4
Stack-machine code two-address code three-address code
t1 := 2 t2 := yt3 := t1*t2t4 := x t5 := t4-t3
33 CS 671 – Spring 2008
Linear IRs
Stack machine code• Assumes presence of operand stack• Useful for stack architectures, JVM• Operations typically pop operands and push results• Advantages
– Easy code generation– Compact form
• Disadvantages– Difficult to rearrange– Difficult to reuse expressions
34 CS 671 – Spring 2008
Stack-Machine Code
Also called one-address code• Assumes an operand stack• Operations take operands from the stack and push
results back onto the stack
Compact, simple to generate and execute– Most operands do not need names– Results are transitory unless explicitly moved to
memory
Used as IR for Smalltalk and Java
Push 2Push yMultiply Push xsubtract
Stack-machine code for x – 2 * y
35 CS 671 – Spring 2008
Linear IR: Three-Address Code
Each instruction is of the form
x := y op z• y and z can be only registers or constants• Just like assembly
Common form of intermediate code
The AST expression x + y * z is translated as t1 := y * z
t2 := x + t1
• Each subexpression has a “home”
36 CS 671 – Spring 2008
Three-Address Code
• Result, operand, operand, operator– x := y op z, where op is a binary operator and x, y,
z can be variables, constants or compiler generated temporaries (intermediate results)
• Can write this as a shorthand– <op, arg1, arg2, result > -- quadruples
• Statements– Assignment x := y op z– Copy stmts x := y– Goto L
37 CS 671 – Spring 2008
Three Address Code
• Statements (cont)– if x relop y goto L– Indexed assignments x := y[j] or s[j] := z– Address and pointer assignments (for C)
• x := &y• x := *p• *x := y
– Parm x; – call p, n; – return y;
38 CS 671 – Spring 2008
Three-Address Code
Every instruction manipulates at most two operands and one result. • Arithmetic operations: x := y op z | x := op y • Data movement: x := y [z] | x[z] := y | x := y • Control flow: if y op z goto x | goto x• Function call: param x | return y | call foo
Reasonably compact; reuse of names and values
t1 := 2 t2 := yt3 := t1*t2t4 := x t5 := t4-t3
Three-address code for x – 2 * y
39 CS 671 – Spring 2008
Storing Three-Address Code
op arg1 arg2 result
(0) Uminus c t1
(1) Mult b t1 t2
(2) Uminus c t3
(3) Mult b t3 t4
(4) Plus t2 t4 t5
(5) Assign t5 a
t1 := - ct2 := b * t1t3 := -ct4 := b * t3t5 := t2 + t4a := t5
3AC
Store all instructions in a quadruple table • Every instr has four fields: op, arg1, arg2, result• Label of instructions index of instruction in table
Quadruple entries
40 CS 671 – Spring 2008
Bigger Example
Consider the AST
t1 := - ct2 := b * t1
t3 := - ct4 := b * t3
t5 := t2 + t4
a := t5
t1 := - ct2 := b * t1
t3 := - ct4 := b * t3
t5 := t2 + t4
a := t5
41 CS 671 – Spring 2008
The IR Machine
A machine with• Infinite number of temporaries (think registers)• Simple instructions
– 3-operands– Branching– Calls with simple calling convention
• Simple code structure– Array of instructions
• Labels to define targets of branches
42 CS 671 – Spring 2008
Temporaries
The machine has an infinite number of temporaries• Call them t0, t1, t2, .... • Temporaries can hold values of any type• The type of the temporary is derived from the
generation• Temporaries go out of scope with each function
43 CS 671 – Spring 2008
Mapping Names to Variables
Variables are names for values• Names given by programmers in the input program• Names given by compilers for storing intermediate
results of computation
Reusing temp variables can save space, but mask context and prevent analysis and optimizations• Result of t1*t2 is no longer available after t1 is reused
t1 := 2 t2 := yt3 := t1*t2t4 := x t5 := t4-t3
Three-address code for x – 2 * y
t1 := 2 t2 := yt1 := t1*t2t2 := x t1 := t2-t1
Different values use distinct names
Different values reuse the same name
44 CS 671 – Spring 2008
Mapping Storage to Variables
Variables are placeholders for values• Every variable must have location to store its value
– Register, stack, heap, static storage• Values must be loaded into registers before use
t1 := 2 t2 := yt3 := t1*t2t4 := x t5 := t4-t3
Three-address code for x – 2 * y:
t1 := 2 t2 := t1*y t3 := x-t2
x and y are in registers x and y are in memory
Which vars can be kept in registers?Which vars must be stored in memory?
void A(int b, int *p) { int a, d; a = 3; d = foo(a); *p =b+d;}
45 CS 671 – Spring 2008
Summary
• IRs provide the interface between the front and back ends of the compiler
– Graphical, linear, hybrid
• Should be machine and language independent
• Should be amenable to optimization