241-437 compilers: ic/10 1 compiler structures objective – –describe intermediate code...

52
241-437 Compilers: IC/10 Compiler Structures Objective describe intermediate code generation explain a stack-based intermediate code for the expression language 241-437, Semester 1, 2011-2012 10. Intermediate Code Generation

Upload: nigel-leslie-reeves

Post on 16-Jan-2016

248 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 1

Compiler Structures

• Objective– describe intermediate code generation– explain a stack-based intermediate code for the expression

language

241-437, Semester 1, 2011-2012

10. Intermediate Code Generation

Page 2: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 2

Overview

1. Intermediate Code (IC) Generation

2. IC Examples

3. Expression Translation in SPIM

4. The Expressions Language

Page 3: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 3

In this lecture

Source Program

Target Lang. Prog.

Semantic Analyzer

Syntax Analyzer

Lexical Analyzer

FrontEnd

Code Optimizer

Target Code Generator

BackEnd

Int. Code Generator

Intermediate Code

Page 4: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 4

1. Intermediate Code (IC) Generation

• Helps with retargeting– e.g. can easily attach a back end for a new machine to

an existing front end

• Enables machine-independent code optimization.

Front end Back endIntermediate

code

Targetmachine

code

Page 5: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 5

Graphical IC Representations

• Abstract Syntax Trees (AST)– retains basic parse tree structure, but with

unneeded nodes removed• Directed Acyclic Graphs (DAG)

– compacted AST to avoid duplication– smaller memory needs

• Control Flow Graphs (CFG)– used to model control flow

Page 6: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 6

Linear (text-based) ICs

• Stack-based (postfix)– e.g. the JVM

• Three-address codex := y op z

• Two-address code:x := op y(the same as x := x op y)

Page 7: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 7

2. IC Examples

• ASTs and DAGs• Stack-based (postfix)• Three-address Code• SPIM

Page 8: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 8

2.1. ASTs and DAGs

assign

a +

* *

b -

c

assign

a +

b b

*

c c

a := b *-c + b * -c

- -

Pros: easy restructuring of codeand/or expressions forintermediate code optimization

Cons: memory intensive

AST DAG

Page 9: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 9

2.2. Stack-based (postfix)

a := b * -c + b * -c

b c uminus * b c uminus * + a assigniload 2 // push biload 3 // push cineg // uminusimul // *iload 2 // push biload 3 // push cineg // uminusimul // *iadd // +istore 1 // store a

(e.g. JVM stack instrs)

Postfix notation representsoperations on a stack

Pro: easy to generateCons: stack operations are more

difficult to optimize

Page 10: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 10

2.3. Three-Address Code

a := b * -c + b * -c

t1 := - ct2 := b * t1t3 := - ct4 := b * t3t5 := t2 + t4a := t5

Translatedfrom the AST

t1 := - ct2 := b * t1t5 := t2 + t2a := t5

Translatedfrom the DAG

Page 11: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 11

2.4. SPIM

• Three address code for a simulator that runs MIPS32 assembly language programs– http://www.cs.wisc.edu/~larus/spim.html

• Loading/Storing– lw register,var - loads value into register– sw register,var - stores value from register

– many, many others

continued

Page 12: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 12

• 8 registers: $t0 - $t7

• Binary math ops (reg1 = reg2 op reg3):– add reg1,reg2,reg3– sub reg1,reg2,reg3– mul reg1,reg2,reg3– div reg1,reg2,reg3

• Unary minus (reg1 = - reg2)– neg reg1, reg2

Page 13: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 13

"a := b * -c + b * -c" in SPIM

assign

a +

* *

b -

c

b

c

lw $t0,cneg $t1,$t0lw $t0,bmul $t2, $t1,$t0lw $t0,cneg $t1,$t0lw $t0,bmul $t1, $t1,$t0add $t1,$t2,$t1sw $t1,a

t1 t1

t0 t0

t0t0

t1t2

t1

-

AST

Page 14: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 14

a := b * -c + b * -c

lw $t0,c

neg $t1,$t0

lw $t0,b

mul $t1, $t1,$t0

add $t2,$t1,$t1

sw $t2,a

assign

a +

b

*-

c

t1t0

t0

t1

t2

DAG

Page 15: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 15

3. Expression Translation in SPIM

Grammar: S => id := E E => E + E E => id

S

a := b + c + d + e

E

E E

E

E

E

E

Generate: lw $t1,b

1As we parse, use attributes to passinformation about the temporary variables up the tree.parse tree --> code using

bottom-up evaluation

Page 16: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 16

S

a := b + c + d + e

E

E E

E

E

E

E

Generate: lw $t1,b lw $t2,c

1 2

Each number corresponds to a temporary variable.

Page 17: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 17

S

a := b + c + d + e

E

E E

E

E

E

E

Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2

1 2

3

Each number corresponds to a temporary variable.

Page 18: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 18

S

a := b + c + d + e

E

E E

E

E

E

E

Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d

1 2

34

Page 19: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 19

S

a := b + c + d + e

E

E E

E

E

E

E

Generate: lw t1,b lw t2,c add $t3,$t1,$t2 lw t4,d add $t5,$t3,$t4

1 2

34

5

Page 20: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 20

S

a := b + c + d + e

E

E E

E

E

E

E

Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d add $t5,$t3,$t4 lw $t6,e 1 2

34

5 6

Page 21: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 21

S

a := b + c + d + e

E

E E

E

E

E

E

1 2

34

5 6

7Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d add $t5,$t3,$t4 lw $t6,e add $t7,$t5,$t6

Page 22: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 22

S

a := b + c + d + e

E

E E

E

E

E

E

1 2

34

5 6

7Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d add $t5,$t3,$t4 lw $t6,e add $t7,$t5,$t6 sw $t7,a

Pro: easy to rearrange code for global optimizationCons: lots of temporaries

Page 23: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 23

Issues when Processing Expressions

• Type checking/conversion.

• Address calculation for more complex types (arrays, records, etc.).

• Expressions in control structures, such as loops and if tests.

Page 24: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 24

4. The Expressions Language• exprParse3.c builds a parse tree for the input

file (reuses code from exprParse2.c).

• An intermediate code is generated from the parse tree, and saved to an output file.

• The input file is not executed by exprParse3.c– that is done by a separate emulator.

Page 25: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 25

Usage

> gcc -Wall -o exprParse3 exprParse3.c> ./exprParse3 < test1.txt> cat codeGen.txtPUSH 2STORE xWRITEPUSH 3LOAD xADDSTORE yWRITESTOP

let x = 2let y = 3 + x

test1.txt

stores intermediatecode in codeGen.txt

exprParse3test1.txt codeGen.txt

Page 26: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 26

Emulator Usage

> ./emulator codeGen.txtReading code from codeGen.txt== 2== 5Stop

emulatorcodeGen.txt

it runs the intermediate code

Page 27: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 27

4.1. The Instruction Set

• The instructions in codeGen.txt are executed by a emulator.– it emulates (simulates) real hardware

• The instructions refer to two data structures used in the emulator.

Page 28: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 28

The Emulator's Data Structures

• The emulator's data structures:– a symbol table of IDs and their integer values– a stack of integers for evaluating the

expressions

2

stackx

4

symbol table

Page 29: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 29

The Instructions

• WRITE // pop top element off stack and print• STOP // exit code emulation

• LOAD ID // get ID value from symbol table,

and push onto stack

• STORE ID // copy stack top into symbol table for ID

continued

Page 30: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 30

• PUSH integer // push integer onto stack

• STORE0 ID // push 0 onto stack, and save to table as value for ID ( same as push 0; store ID)

• MULT // pop two stack values, multiply them, push result back

• ADD, MINUS, DIV // same for those ops

Page 31: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 31

Intermediate Code Type

• Since the intermediate code uses a stack to store values rather than registers, then it is a stack-based (postfix) representation.

Page 32: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 32

4.2. exprParse3.c Coding

• All the parsing code in exprParse3.c is the same as exprParse2.c.

• The difference is that the parse tree is passed to a generateCode() function to convert it to intermediate code– see main()

Page 33: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 33

main()#define CODE_FNM "codeGen.txt"

// where to store generated code

int main(void)/* parse, print the tree, then generate code which is stored in CODE_FNM */{ Tree *t; nextToken(); t = statements(); match(SCANEOF);

printTree(t, 0); generateCode(CODE_FNM, t); return 0;}

Page 34: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 34

Generating the Codevoid generateCode(char *fnm, Tree *t)/* Open the intermediate code file, fnm, and

write to it. */{ FILE *fp; if ((fp = fopen(fnm, "w")) == NULL) { printf("Could not write to %s\n", fnm); exit(1); } else { printf("Writing code to %s\n", fnm); cgTree(fp, t); fprintf(fp, "STOP\n");

// last instruction in file fclose(fp); }} // end of generateCode()

Page 35: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 35

void cgTree(FILE *fp, Tree *t)/* Recurse over the parse tree looking for non-NEWLINE subtrees to convert into code Each block of code generated for a non-NEWLINE subtree ends with a WRITE instruction, to print out the value of the line. */{ if (t == NULL) return; Token tok = TreeOper(t); if (tok == NEWLINE) { cgTree(fp, TreeLeft(t)); cgTree(fp, TreeRight(t)); } else { codeGen(fp, t); fprintf(fp, "WRITE\n"); // print value at EOL }} // end of cgTree()

Page 36: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 36

void codeGen(FILE *fp, Tree *t)/* Convert the tree nodes for ID, INT, ASSIGNOP, PLUSOP, MINUSOP, MULTOP, DIVOP into instructions.

The load/store instructions: LOAD ID, STORE ID, STORE0 ID, PUSH integer The math instructions: MULT, ADD, MINUS, DIV*/{ if (t == NULL) return;

:

continued

Page 37: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 37

Token tok = TreeOper(t);

if (tok == ID) codeGenID(fp, TreeID(t)); else if (tok == INT) fprintf(fp, "PUSH %d\n", TreeValue(t)); else if (tok == ASSIGNOP) { // id = expr char *id = TreeID(TreeLeft(t)); getIDEntry(id); // don't use Symbol info codeGen(fp, TreeRight(t)); fprintf(fp, "STORE %s\n", id); } :

continued

Page 38: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 38

else if (tok == PLUSOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "ADD\n"); } else if (tok == MINUSOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "MINUS\n"); } :

continued

Page 39: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 39

else if (tok == MULTOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "MULT\n"); } else if (tok == DIVOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "DIV\n"); }} // end of codeGen()

Page 40: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 40

void codeGenID(FILE *fp, char *id)/* An ID may already be in the symbol table, or be new, which is converted into a LOAD or a STORE0 code operation. */{ SymbolInfo *si = NULL;

if ((si = lookupID(id)) != NULL) // already declared fprintf(fp, "LOAD %s\n", id); else { // new, so add to table addID(id, 0); // 0 is default value fprintf(fp, "STORE0 %s\n", id); }} // end of codeGenID()

Page 41: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 41

From Tree to Code

\n

\n

NULL =

x 2

=

y +

3 x

let x = 2let y = 3 + x

x

0

symbol tablein exprParse3.c

PUSH 2STORE xWRITEPUSH 3LOAD xADDSTORE yWRITESTOP

y

0

Page 42: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 42

4.3. The Emulator

> gcc –Wall –o emulator emulator.c

> ./emulator codeGen.txtReading code from codeGen.txt== 2== 5Stop

Page 43: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 43

Emulator Data Structures#define MAX_SYMS 15 // max no of vars#define STACK_SIZE 10

// stack data structureint stack[STACK_SIZE];int stackTop = -1;

// symbol table data structurestypedef struct SymInfo { char *id; int value;} SymbolInfo;

int symNum = 0; // number of symbols storedSymbolInfo syms[MAX_SYMS];

2

x

4

Page 44: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 44

Evaluating Input Lines

void eval(FILE *fp)/* Read in the code file a line at a time and

process the lines. An instruction on a line may be a single

command (e.g. WRITE) or a instruction name and an argument (e.g. LOAD x). */

{ char buf[BUFSIZ]; char cmd[MAX_LEN], arg[MAX_LEN]; int no; :

continued

Page 45: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 45

while (fgets(buf, sizeof(buf), fp) != NULL) { no = sscanf(buf, "%s %s\n", cmd, arg); if ((no < 1) || (no > 2)) printf("Unknown format: %s\n", buf); else processCmd(cmd, arg);

// process commands as they are read in }} // end of eval()

Page 46: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 46

Processing an Instructionvoid processCmd(char *cmd, char *arg){ SymbolInfo *si; if (strcmp(cmd, "LOAD") == 0) { if ((si = lookupID(arg)) == NULL) { printf("Error: load cannot find %s\n", arg); exit(1); } push(si->value); } else if (strcmp(cmd, "STORE") == 0) addID(arg, topOf()); else if (strcmp(cmd, "STORE0") == 0) { push(0); addID(arg, 0); }

continued

Page 47: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 47

else if (strcmp(cmd, "PUSH") == 0) push( atoi(arg) ); else if (strcmp(cmd, "MULT") == 0) { int v2 = pop(); int v1 = pop(); push( v1*v2 ); } else if (strcmp(cmd, "ADD") == 0) { int v2 = pop(); int v1 = pop(); push( v1+v2 ); } else if (strcmp(cmd, "MINUS") == 0) { int v2 = pop(); int v1 = pop(); push( v1-v2 ); }

continued

Page 48: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 48

else if (strcmp(cmd, "DIV") == 0) { int v2 = pop(); if (v2 == 0) { printf("Error: div by 0; using 1\n"); v2 = 1; } int v1 = pop(); push( v1/v2 ); } else if (strcmp(cmd, "WRITE") == 0) printf("== %d\n", pop()); else if (strcmp(cmd, "STOP") == 0) { printf("Stop\n"); exit(1); }

continued

Page 49: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 49

else printf("Unknown instruction: %s\n", cmd);} // end of processCmd()

Page 50: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 50

Evaluating the Code for test1.txt

let x = 2let y = 3 + x

PUSH 2STORE xWRITEPUSH 3LOAD xADDSTORE yWRITESTOP

test1.txt codeGen.txt

continued

Page 51: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 51

• PUSH 2

• STORE X

• WRITE

• PUSH 3

2

2

x

2

x

2

3

x

2

stack symbol table

x

2

continued

Page 52: 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression

241-437 Compilers: IC/10 52

• LOAD X

• ADD

• STORE Y

• WRITE• STOP

32

x

2

x

2

x

2

stack symbol table

y

5

x

2

5

5

y

5

y

5

2+3