241-437 compilers: ic/10 1 compiler structures objective – –describe intermediate code...
TRANSCRIPT
241-437 Compilers: IC/10 1
Compiler Structures
• Objective– describe intermediate code generation– explain a stack-based intermediate code for the expression
language
241-437, Semester 1, 2011-2012
10. Intermediate Code Generation
241-437 Compilers: IC/10 2
Overview
1. Intermediate Code (IC) Generation
2. IC Examples
3. Expression Translation in SPIM
4. The Expressions Language
241-437 Compilers: IC/10 3
In this lecture
Source Program
Target Lang. Prog.
Semantic Analyzer
Syntax Analyzer
Lexical Analyzer
FrontEnd
Code Optimizer
Target Code Generator
BackEnd
Int. Code Generator
Intermediate Code
241-437 Compilers: IC/10 4
1. Intermediate Code (IC) Generation
• Helps with retargeting– e.g. can easily attach a back end for a new machine to
an existing front end
• Enables machine-independent code optimization.
Front end Back endIntermediate
code
Targetmachine
code
241-437 Compilers: IC/10 5
Graphical IC Representations
• Abstract Syntax Trees (AST)– retains basic parse tree structure, but with
unneeded nodes removed• Directed Acyclic Graphs (DAG)
– compacted AST to avoid duplication– smaller memory needs
• Control Flow Graphs (CFG)– used to model control flow
241-437 Compilers: IC/10 6
Linear (text-based) ICs
• Stack-based (postfix)– e.g. the JVM
• Three-address codex := y op z
• Two-address code:x := op y(the same as x := x op y)
241-437 Compilers: IC/10 7
2. IC Examples
• ASTs and DAGs• Stack-based (postfix)• Three-address Code• SPIM
241-437 Compilers: IC/10 8
2.1. ASTs and DAGs
assign
a +
* *
b -
c
assign
a +
b b
*
c c
a := b *-c + b * -c
- -
Pros: easy restructuring of codeand/or expressions forintermediate code optimization
Cons: memory intensive
AST DAG
241-437 Compilers: IC/10 9
2.2. Stack-based (postfix)
a := b * -c + b * -c
b c uminus * b c uminus * + a assigniload 2 // push biload 3 // push cineg // uminusimul // *iload 2 // push biload 3 // push cineg // uminusimul // *iadd // +istore 1 // store a
(e.g. JVM stack instrs)
Postfix notation representsoperations on a stack
Pro: easy to generateCons: stack operations are more
difficult to optimize
241-437 Compilers: IC/10 10
2.3. Three-Address Code
a := b * -c + b * -c
t1 := - ct2 := b * t1t3 := - ct4 := b * t3t5 := t2 + t4a := t5
Translatedfrom the AST
t1 := - ct2 := b * t1t5 := t2 + t2a := t5
Translatedfrom the DAG
241-437 Compilers: IC/10 11
2.4. SPIM
• Three address code for a simulator that runs MIPS32 assembly language programs– http://www.cs.wisc.edu/~larus/spim.html
• Loading/Storing– lw register,var - loads value into register– sw register,var - stores value from register
– many, many others
continued
241-437 Compilers: IC/10 12
• 8 registers: $t0 - $t7
• Binary math ops (reg1 = reg2 op reg3):– add reg1,reg2,reg3– sub reg1,reg2,reg3– mul reg1,reg2,reg3– div reg1,reg2,reg3
• Unary minus (reg1 = - reg2)– neg reg1, reg2
241-437 Compilers: IC/10 13
"a := b * -c + b * -c" in SPIM
assign
a +
* *
b -
c
b
c
lw $t0,cneg $t1,$t0lw $t0,bmul $t2, $t1,$t0lw $t0,cneg $t1,$t0lw $t0,bmul $t1, $t1,$t0add $t1,$t2,$t1sw $t1,a
t1 t1
t0 t0
t0t0
t1t2
t1
-
AST
241-437 Compilers: IC/10 14
a := b * -c + b * -c
lw $t0,c
neg $t1,$t0
lw $t0,b
mul $t1, $t1,$t0
add $t2,$t1,$t1
sw $t2,a
assign
a +
b
*-
c
t1t0
t0
t1
t2
DAG
241-437 Compilers: IC/10 15
3. Expression Translation in SPIM
Grammar: S => id := E E => E + E E => id
S
a := b + c + d + e
E
E E
E
E
E
E
Generate: lw $t1,b
1As we parse, use attributes to passinformation about the temporary variables up the tree.parse tree --> code using
bottom-up evaluation
241-437 Compilers: IC/10 16
S
a := b + c + d + e
E
E E
E
E
E
E
Generate: lw $t1,b lw $t2,c
1 2
Each number corresponds to a temporary variable.
241-437 Compilers: IC/10 17
S
a := b + c + d + e
E
E E
E
E
E
E
Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2
1 2
3
Each number corresponds to a temporary variable.
241-437 Compilers: IC/10 18
S
a := b + c + d + e
E
E E
E
E
E
E
Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d
1 2
34
241-437 Compilers: IC/10 19
S
a := b + c + d + e
E
E E
E
E
E
E
Generate: lw t1,b lw t2,c add $t3,$t1,$t2 lw t4,d add $t5,$t3,$t4
1 2
34
5
241-437 Compilers: IC/10 20
S
a := b + c + d + e
E
E E
E
E
E
E
Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d add $t5,$t3,$t4 lw $t6,e 1 2
34
5 6
241-437 Compilers: IC/10 21
S
a := b + c + d + e
E
E E
E
E
E
E
1 2
34
5 6
7Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d add $t5,$t3,$t4 lw $t6,e add $t7,$t5,$t6
241-437 Compilers: IC/10 22
S
a := b + c + d + e
E
E E
E
E
E
E
1 2
34
5 6
7Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d add $t5,$t3,$t4 lw $t6,e add $t7,$t5,$t6 sw $t7,a
Pro: easy to rearrange code for global optimizationCons: lots of temporaries
241-437 Compilers: IC/10 23
Issues when Processing Expressions
• Type checking/conversion.
• Address calculation for more complex types (arrays, records, etc.).
• Expressions in control structures, such as loops and if tests.
241-437 Compilers: IC/10 24
4. The Expressions Language• exprParse3.c builds a parse tree for the input
file (reuses code from exprParse2.c).
• An intermediate code is generated from the parse tree, and saved to an output file.
• The input file is not executed by exprParse3.c– that is done by a separate emulator.
241-437 Compilers: IC/10 25
Usage
> gcc -Wall -o exprParse3 exprParse3.c> ./exprParse3 < test1.txt> cat codeGen.txtPUSH 2STORE xWRITEPUSH 3LOAD xADDSTORE yWRITESTOP
let x = 2let y = 3 + x
test1.txt
stores intermediatecode in codeGen.txt
exprParse3test1.txt codeGen.txt
241-437 Compilers: IC/10 26
Emulator Usage
> ./emulator codeGen.txtReading code from codeGen.txt== 2== 5Stop
emulatorcodeGen.txt
it runs the intermediate code
241-437 Compilers: IC/10 27
4.1. The Instruction Set
• The instructions in codeGen.txt are executed by a emulator.– it emulates (simulates) real hardware
• The instructions refer to two data structures used in the emulator.
241-437 Compilers: IC/10 28
The Emulator's Data Structures
• The emulator's data structures:– a symbol table of IDs and their integer values– a stack of integers for evaluating the
expressions
2
stackx
4
symbol table
241-437 Compilers: IC/10 29
The Instructions
• WRITE // pop top element off stack and print• STOP // exit code emulation
• LOAD ID // get ID value from symbol table,
and push onto stack
• STORE ID // copy stack top into symbol table for ID
continued
241-437 Compilers: IC/10 30
• PUSH integer // push integer onto stack
• STORE0 ID // push 0 onto stack, and save to table as value for ID ( same as push 0; store ID)
• MULT // pop two stack values, multiply them, push result back
• ADD, MINUS, DIV // same for those ops
241-437 Compilers: IC/10 31
Intermediate Code Type
• Since the intermediate code uses a stack to store values rather than registers, then it is a stack-based (postfix) representation.
241-437 Compilers: IC/10 32
4.2. exprParse3.c Coding
• All the parsing code in exprParse3.c is the same as exprParse2.c.
• The difference is that the parse tree is passed to a generateCode() function to convert it to intermediate code– see main()
241-437 Compilers: IC/10 33
main()#define CODE_FNM "codeGen.txt"
// where to store generated code
int main(void)/* parse, print the tree, then generate code which is stored in CODE_FNM */{ Tree *t; nextToken(); t = statements(); match(SCANEOF);
printTree(t, 0); generateCode(CODE_FNM, t); return 0;}
241-437 Compilers: IC/10 34
Generating the Codevoid generateCode(char *fnm, Tree *t)/* Open the intermediate code file, fnm, and
write to it. */{ FILE *fp; if ((fp = fopen(fnm, "w")) == NULL) { printf("Could not write to %s\n", fnm); exit(1); } else { printf("Writing code to %s\n", fnm); cgTree(fp, t); fprintf(fp, "STOP\n");
// last instruction in file fclose(fp); }} // end of generateCode()
241-437 Compilers: IC/10 35
void cgTree(FILE *fp, Tree *t)/* Recurse over the parse tree looking for non-NEWLINE subtrees to convert into code Each block of code generated for a non-NEWLINE subtree ends with a WRITE instruction, to print out the value of the line. */{ if (t == NULL) return; Token tok = TreeOper(t); if (tok == NEWLINE) { cgTree(fp, TreeLeft(t)); cgTree(fp, TreeRight(t)); } else { codeGen(fp, t); fprintf(fp, "WRITE\n"); // print value at EOL }} // end of cgTree()
241-437 Compilers: IC/10 36
void codeGen(FILE *fp, Tree *t)/* Convert the tree nodes for ID, INT, ASSIGNOP, PLUSOP, MINUSOP, MULTOP, DIVOP into instructions.
The load/store instructions: LOAD ID, STORE ID, STORE0 ID, PUSH integer The math instructions: MULT, ADD, MINUS, DIV*/{ if (t == NULL) return;
:
continued
241-437 Compilers: IC/10 37
Token tok = TreeOper(t);
if (tok == ID) codeGenID(fp, TreeID(t)); else if (tok == INT) fprintf(fp, "PUSH %d\n", TreeValue(t)); else if (tok == ASSIGNOP) { // id = expr char *id = TreeID(TreeLeft(t)); getIDEntry(id); // don't use Symbol info codeGen(fp, TreeRight(t)); fprintf(fp, "STORE %s\n", id); } :
continued
241-437 Compilers: IC/10 38
else if (tok == PLUSOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "ADD\n"); } else if (tok == MINUSOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "MINUS\n"); } :
continued
241-437 Compilers: IC/10 39
else if (tok == MULTOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "MULT\n"); } else if (tok == DIVOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "DIV\n"); }} // end of codeGen()
241-437 Compilers: IC/10 40
void codeGenID(FILE *fp, char *id)/* An ID may already be in the symbol table, or be new, which is converted into a LOAD or a STORE0 code operation. */{ SymbolInfo *si = NULL;
if ((si = lookupID(id)) != NULL) // already declared fprintf(fp, "LOAD %s\n", id); else { // new, so add to table addID(id, 0); // 0 is default value fprintf(fp, "STORE0 %s\n", id); }} // end of codeGenID()
241-437 Compilers: IC/10 41
From Tree to Code
\n
\n
NULL =
x 2
=
y +
3 x
let x = 2let y = 3 + x
x
0
symbol tablein exprParse3.c
PUSH 2STORE xWRITEPUSH 3LOAD xADDSTORE yWRITESTOP
y
0
241-437 Compilers: IC/10 42
4.3. The Emulator
> gcc –Wall –o emulator emulator.c
> ./emulator codeGen.txtReading code from codeGen.txt== 2== 5Stop
241-437 Compilers: IC/10 43
Emulator Data Structures#define MAX_SYMS 15 // max no of vars#define STACK_SIZE 10
// stack data structureint stack[STACK_SIZE];int stackTop = -1;
// symbol table data structurestypedef struct SymInfo { char *id; int value;} SymbolInfo;
int symNum = 0; // number of symbols storedSymbolInfo syms[MAX_SYMS];
2
x
4
241-437 Compilers: IC/10 44
Evaluating Input Lines
void eval(FILE *fp)/* Read in the code file a line at a time and
process the lines. An instruction on a line may be a single
command (e.g. WRITE) or a instruction name and an argument (e.g. LOAD x). */
{ char buf[BUFSIZ]; char cmd[MAX_LEN], arg[MAX_LEN]; int no; :
continued
241-437 Compilers: IC/10 45
while (fgets(buf, sizeof(buf), fp) != NULL) { no = sscanf(buf, "%s %s\n", cmd, arg); if ((no < 1) || (no > 2)) printf("Unknown format: %s\n", buf); else processCmd(cmd, arg);
// process commands as they are read in }} // end of eval()
241-437 Compilers: IC/10 46
Processing an Instructionvoid processCmd(char *cmd, char *arg){ SymbolInfo *si; if (strcmp(cmd, "LOAD") == 0) { if ((si = lookupID(arg)) == NULL) { printf("Error: load cannot find %s\n", arg); exit(1); } push(si->value); } else if (strcmp(cmd, "STORE") == 0) addID(arg, topOf()); else if (strcmp(cmd, "STORE0") == 0) { push(0); addID(arg, 0); }
continued
241-437 Compilers: IC/10 47
else if (strcmp(cmd, "PUSH") == 0) push( atoi(arg) ); else if (strcmp(cmd, "MULT") == 0) { int v2 = pop(); int v1 = pop(); push( v1*v2 ); } else if (strcmp(cmd, "ADD") == 0) { int v2 = pop(); int v1 = pop(); push( v1+v2 ); } else if (strcmp(cmd, "MINUS") == 0) { int v2 = pop(); int v1 = pop(); push( v1-v2 ); }
continued
241-437 Compilers: IC/10 48
else if (strcmp(cmd, "DIV") == 0) { int v2 = pop(); if (v2 == 0) { printf("Error: div by 0; using 1\n"); v2 = 1; } int v1 = pop(); push( v1/v2 ); } else if (strcmp(cmd, "WRITE") == 0) printf("== %d\n", pop()); else if (strcmp(cmd, "STOP") == 0) { printf("Stop\n"); exit(1); }
continued
241-437 Compilers: IC/10 49
else printf("Unknown instruction: %s\n", cmd);} // end of processCmd()
241-437 Compilers: IC/10 50
Evaluating the Code for test1.txt
let x = 2let y = 3 + x
PUSH 2STORE xWRITEPUSH 3LOAD xADDSTORE yWRITESTOP
test1.txt codeGen.txt
continued
241-437 Compilers: IC/10 51
• PUSH 2
• STORE X
• WRITE
• PUSH 3
2
2
x
2
x
2
3
x
2
stack symbol table
x
2
continued
241-437 Compilers: IC/10 52
• LOAD X
• ADD
• STORE Y
• WRITE• STOP
32
x
2
x
2
x
2
stack symbol table
y
5
x
2
5
5
y
5
y
5
2+3