lecture compiler construction - graz university of · pdf filelecture compiler construction...

309
Lecture Compiler Construction Franz Wotawa [email protected] Institute for Software Technology Technische Universit¨ at Graz Inffeldgasse 16b/2, A-8010 Graz, Austria Summer term 2017 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 1 / 309

Upload: vukiet

Post on 06-Mar-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Lecture Compiler Construction

Franz [email protected]

Institute for Software TechnologyTechnische Universitat Graz

Inffeldgasse 16b/2, A-8010 Graz, Austria

Summer term 2017

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 1 / 309

Compilers are everywhere

Programming languages like Java, C#, C, C++, Pascal, Modula,SML, Lisp, VHDL, Basic,..Graphical languages also need compilers (HTML, LATEX,...)Means for communication between computers like XMLNatural languages

Compilers translate a sentence written in one language to anotherlanguage.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 2 / 309

Example – HTML

<!DOCTYPE HTML PUBLIC "...- > <html> <head> <metacontent=text/html; ..."http-equiv=Content-Type- ><title>Hello World</title> </head> <body> <h1>HelloWorld!</h1> </body> </html>

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 3 / 309

Another example

(Source: en.wikipedia.org/wiki/Java bytecode)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 4 / 309

Compiler vs. Interpreter

Compiler: Translates from one into another language whereprograms are directly executed (C,C++,Pascal, ...).

Interpreter: Executes a program directly without converting it (BASIC,batch languages,..).

Exact boundary difficult to define nowadays (Byte codeinterpreter vs. CPU, which executes machine codestatements.Front end (analysis phase) of compilers is always needed!

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 5 / 309

Organizational issues

Lecture (Vorlesung) 2hPractical exercises (Ubung) 1h

Examples from the Compiler Construction theoryHands on part: Development of a compilerMore information soon (via email and/or the webpage)

Office hours:Franz Wotawa: Tuesday, 13:00–14:00Roxane Koitz: Monday, 13:00–14:00

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 6 / 309

Lecture dates

Monday, 6.3.2017, 16:00 - 18:00 (preliminary discussion)

Tuesday, 7.3.2017, 16:00 - 17:30 (lexical analysis)

Monday, 20.3.2017, 16:00 - 19:00 (parsing)

Tuesday, 21.3.2017, 16:00 - 17:30 (attributed grammars)

Monday, 27.3.2017, 16:00 - 19:00 (type checking)

Tuesday, 28.3.2017, 16:00 - 17:00 (runtime environment)

Monday, 3.4.2017, 16:00 - 19:00 (code generation)

Tuesday, 4.4.2017, 16:00-17:30 (code optimization, Q&A)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 7 / 309

Exam

Written form only; no papers, books etc. allowed!Content: DFA, NFA, LL(1), SLR(1), Attributed Grammars, TypeChecking, Code Generation, Code Optimization, etc.)0. date: Monday, 15.5.2017, 18:00-19:00, i13, (1 groups, 1 hour)1. date: Monday, 19.6.2017, 18:00-20:00, i13, (2 groups, each 1hour)2. date: in October 2017, will be announced soon

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 8 / 309

Literature, etc.

Aho, Seti, Ullman, Compilers Principles, Techniques, andTools, Addison-Wesley, 1985, ISBN 0-201-10088-6.Tremblay, Sorenson, The Theory and Practice of Compiler Writing,McGraw Hill, 1985, ISBN 0-07-065161-2.Copy of slides but no lecture notesCompiler construction webpage:http://www.ist.tugraz.at/cb16.html

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 9 / 309

PART 1 - INTRODUCTION

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 10 / 309

What is a compiler?

ProgramSource language⇒ Target languageError reports if source code contains errors

Source program −→ Compiler −→ Target program↓

Error messages

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 11 / 309

Implications of definition

There are (very) many compilersThousands of source languages (e.g. Pascal, Modula, C, C++,Java, . . . )Thousands of target languages (other high-level programminglanguages, machine code)Classification: single-pass, multi-pass, debugging, or optimizingBut: basics of creating a compiler are mostly consistent

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 12 / 309

Concept of compilation

Analysis phaseSplit source program into tokensGenerate intermediate code (intermediate representation of sourceprogram)

Synthesis phaseGenerate target program based on intermediate code

Enables reuse of program parts for different source and targetlanguages

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 13 / 309

Concept of compilation – Analysis

Operations used within program are stored in hierarchicalstructureSyntax TreeOther tools which perform an analysis:

Structure editorsPretty printersStatic checkersInterpretersWeb browser

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 14 / 309

Language-processing systems

skeletal source program↓

preprocessor↓

source program↓

compiler↓

target assembly language↓

assembler↓

relocatable machine code

↓relocatable machine code

loader/link-editor ←−library,relocatableobject files

↓absolute machine code

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 15 / 309

Components of a compiler

lexical analyser

semantic analyser

syntax analyser

intermediate code

generator

code optimizer

code generator

error handler

manager

symbol-table

source code

target program

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 16 / 309

Lexical analysis, scanning

Example:pos := init + rate * 60

Tokens:1 Identifier pos2 Assignment symbol :=3 Identifier init4 Plusoperator5 Identifier rate6 Multiplication operator7 Constant (number) 60

Syntax tree

:=

pos+

init *

rate 60

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 17 / 309

Syntax analysis, parsing

Grammatical analysisToken↔ rules of grammarRules of grammar (example)

1 Each identifier is an expression2 Each number is an expression3 Assuming that ex1 and ex2 are expressions, then ex1 + ex2 andex1 ∗ ex2 are expressions as well

4 A term: identifier := Expression is a statement

Parse tree

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 18 / 309

Example - Parse tree

assignment

statement

:=identifier expression

pos

expression +

identifier

init

expression

expression

identifier

rate

*

60

expression

number

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 19 / 309

Semantic analysis

Check for semantic errorsType checkingIdentification of operators and operandsNecessary for code generationExample: conversion from int to real

statementpos := init + rate * 60

becomespos := init + rate * int2real(60)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 20 / 309

Intermediate code generation

E.g.: three-address codeExample:1. tmp1 := int2real(60)2. tmp2 := id3 * tmp13. tmp3 := id2 + tmp24. id1 := tmp3

using the symbol table

pos (id1) real . . .init (id2) real . . .rate (id3) real . . .

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 21 / 309

Code optimizer

Goal: faster machine codeExample:

1. tmp1 := int2real(60)2. tmp2 := id3 * tmp13. tmp3 := id2 + tmp24. id1 := tmp3

⇓1. tmp1 := id3 * 60.02. id1 := id2 + tmp1

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 22 / 309

Code generator

Generation of actual target code (Machine code, Assembler)Example:

1. MOVF id3, R22. MULF #60.0, R23. MOVF id2, R14. ADDF R2, R15. MOVF R1, id1

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 23 / 309

PART 2 - LEXICAL ANALYSIS

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 24 / 309

Goals of this section

Specification and implementation of lexical analysersEasiest method:

1 Create diagram which describes structure of tokens2 Manually translate diagram into a program

Regular expressions in finite state machines

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 25 / 309

Tasks

get next token

token

source program analyser

lexicalparser

symbol

table

Filter comments, white spaces, ..Establish relations between error messages of compiler andsource code (e.g. via line number)Preprocessor functions (e.g. in C)Create copy of source program containing error messages

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 26 / 309

Why separate lexical analysis and parsing?

1 Simpler design: e.g. removal of comments and white spacesenables use of simpler parser design

2 Improve efficiency: large portion of time is invested into reading ofprograms and conversion into tokens

3 Enhance portability of compilers4 ( There are tools for both phases )

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 27 / 309

Tokens, patterns, lexemes

Pattern: Rule describing a set of strings (a token)Token: Set of strings described by a pattern

Lexem: Sequence of characters which are matched by a patternof a token

Token Lexem Patterns (verbal)if if ifrelation >, <, . . . < or > or . . .id pi, test cases letter followed by letters, digits, or ’ ’num 2.7, 0 , 12e-4 any numeric constantliteral “segmentation fault “ any characters between “ and “

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 28 / 309

Tokens

Terminals in the grammar (checked by the parser)In programming languages:keywords, operators, identifiers, constants, strings, brackets

Language conventions:1 Tokens have to occur in specific places within the code2 Tokens are separated by white spaces

Example (Fortran): DO 5 I = 1.25 is interpreted as DO5I =1.25

3 Keywords are reserved and must not be used as identifiers1. is (currently) not used2. is accepted3. is reasonable (and simplyfies compiler construction)

IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 29 / 309

Token attributes

Store lexeme for further processingStore pointers to entries in symbol tableLine number or position of token within source code

U = 2 * R * PI

< id, pointer to U >< assign op >< num, integer value 2 >< mult op >< id, pointer to R >< mult op >< id, pointer to PI >

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 30 / 309

Lexical errors

fi ( a == 1) ...⇒ fi is interpreted as undefined identifierString sequences which cannot be matched to a tokene.g.: 2.3e*4 instead of 2.3e+4.Error-correcting:

1 Panic Mode: Removal of faulty characters until a token can bematched

2 Remove a faulty character3 Include a new character4 Replace a character5 Change order of neighboring characters

Some errors (like the first example) may be only recognizable bythe parser

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 31 / 309

Specification of tokens

Strings over an alphabet:Alphabet . . . finite set of symbols example: {0, 1} Binary alphabetString (word, sentence) . . . finite sequence of symbols of the alphabetε . . . empty string

Language: Set of all strings over an alphabetOperators:

Concatenation xy is a string whereby y is attached to the end of xxi is defined as: x0 = ε, i > 0→ xi = xi−1xPrefix: computer→ compSuffix: computer→ uterSubstring: computer→ putSubsequence: computer→ opt

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 32 / 309

Operations on languages

Union L ∪M = {x|x ∈ L∨x ∈M}Concatenation LM = {xy|x ∈ L∧ y ∈M}Kleen Closure L∗ =

⋃∞i=0 L

i

Positive Closure L+ =⋃∞i=1 L

i

Example: L = {A, . . . , Z},M = {0, 1, . . . , 9}L ∪M Set of letters and numbersLM Strings containing a letter followed by a numberL4 Strings which are made up of four lettersL∗ Strings made up of letters including εM+ Number strings containing at least one number

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 33 / 309

Regular expressions

Example: Pascal identifier letter ( letter | digit )∗

Regular expression r defines a language over an alphabet Σusing the following rules:

1 The regular expression ε describes {ε}2 a ∈ Σ: The regular Expression a describes {a}3 r, s are regular expressions (languages L(r), L(s)):

1 (r)|(s) describes L(r) ∪ L(s)2 (r)(s) describes L(r)L(s)3 (r)∗ describes (L(r))∗

4 (r)+ describes (L(r))+

Regular sets

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 34 / 309

Examples

Assuming Σ = {a, b}1 a|b describes the language {a, b}2 (a|b)(a|b) describes {aa, ab, ba, bb}3 a∗ describes {ε, a, aa, aaa, aaaa, . . .}4 (a|b)∗ describes all strings which contain no or multiple a’s and b’s

Not all languages can be described by regular expressionse.g. {wcw|w is a string made up of a’s and b’s}Regular expressions may only describe words containing either afixed amount of repetitions or an infinite amount of repetitions

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 35 / 309

Regular definitions

A regular definition is a sequence of the form:

d1 → r1. . .dn → rn

whereby di is a distinct name and ri is a regular expression overΣ ∪ {d1, . . . , dn}Example:

letter → A| . . . |Z|a| . . . |zdigit → 0|1| . . . |9

id → letter(letter|digit)∗

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 36 / 309

Token detection

Goal: Identify lexeme of token in input buffer

Output: Token-attribute-pair

Regular Expression Token Attribute Valuewhite space - -

if if -then then -

id id pointer to table entrynum num pointer to table entry< relop LT> relop GT. . . . . . . . .

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 37 / 309

Transition diagrams

Actions performed by lexical analyser (after get next token ofthe parser)States connected by directed edgesEdges are labeled:

Labels contain symbols expected in input buffer in order to toprogress to next stateLabel other denotes all symbols not used by any other edgeoriginating from the same node

One start stateEnd states after matching a tokenDeterministic (not necessarily)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 38 / 309

Example

Regular expression letter (letter | digit)∗ can be described byfollowing transition diagram:

start letter

letter or digit

otherreturn(gettoken(),install_id())1 2 3

Assume the input buffer contains:Pos 1 2 3 4 . . .

P I = . . .and the current-symbol pointer is pointing to position 1

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 39 / 309

Tokenmatching - Example

1 Current symbol = P, position = 1⇒ Transition from state 1 to state 2 and read new symbol

2 Current symbol = I, position = 2⇒ Remain in state 2 and read new symbol

3 Current symbol = SPACE, position = 3⇒ Transition from state 2 to state 3

4 State 3 is an end state

start letter

letter or digit

otherreturn(gettoken(),install_id())1 2 3

P

I

’ ’

1

2

3

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 40 / 309

Example – Implementing a Lexer directly

Pseudo code1 lexem = ““;2 if (next char ∈ letter) then3 lexem = lexem + next char;4 get next char();5 while (next char ∈ letter ∪ digit) do6 lexem = lexem + next char;7 get next char();8 return IDENTIFIER(lexem);

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 41 / 309

Generalization – Finite automata

Nondeterministic finite automaton (NFA) (S,Σ,move, s0, F )1 Set of states S2 Set of input symbols Σ3 Transition relation move : S × (Σ ∪ {ε}) 7→ 2S , relating

state-symbol-pairs to states4 Initial state s0 ∈ S5 Set of end states F ⊆ S

NFA can be visualized in form of directed graph (transition graph)Distinctions from transition diagram:

Nondeterministicε-moves allowed

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 42 / 309

NFA – Example

NFA for aa∗|bb∗ (language, accepting NFA):({0, 1, 2, 3, 4}, {a, b},move, 0, {2, 4}) withmove:

state symbol new state0 ε {1,3}1 a {2}2 a {2}3 b {4}4 b {4}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 43 / 309

Deterministic finite automaton

Deterministic finite automaton (DFA) is a special case of a NFAwhereby:

1 No state has an ε-move2 For each state s there is at most one edge which is

marked with an input symbol a

DFAs contain exactly one transition for each inputEach entry in transition table contains exactly one state

State Input Symbolsa b

0 1 2

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 44 / 309

DFA simulation

Input: Input string x, DFA DOutput: ’Yes’ if D accepts x, ’No’ otherwise

s := s0c := nextcharwhile c 6= eof do

s := move(s, c)if s ==⊥ then

return ’No’c := nextchar

endif s ∈ F then

return ’Yes’else

return ’No’end

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 45 / 309

DFA – Example

DFA for aa∗|bb∗:({0, 1, 2}, {a, b},move, 0, {1, 2}) whereby

move:

state symbol new state0 a 10 b 21 a 12 b 2

0 b

a

2

1

a

b

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 46 / 309

Conversion NFA→ DFA

Subset construction algorithmIdea:

1 Each DFA state corresponds to a set of NFA states2 After reading a1a2 . . . an the DFA is in a state which ist equivalent to

a subset of the NFA. This subset corresponds to the path of theNFA when reading a1a2 . . . an.

Number of DFA states is exponential in the nunber of NFA states(worst case)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 47 / 309

Subset construction

Input: NFA N = (S,Σ,move, s0, F )Output: DFA D, accepting the same language as N

Initially, ε-closure(s0) is the only state in Dstates and it is unmarkedwhile there is an unmarked state T in Dstates do

mark Tfor each input symbol a do

U := ε-closure(move(T ,a))if U 6∈ Dstates then

Add U as unmarked state to DstatesendDtran[T, a] := U

endend

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 48 / 309

ε-closure, move

ε-closure(s): Set of NFA states which are accessible from a state svia ε-movesε-closure(T ): Set of NFA states which are accessible from a states ∈ T via ε-movesmove(T , a): Set of NFA states which are accessible from a states ∈ T via a move relation of the input symbol a

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 49 / 309

Calculation of ε-closure

Input: NFA N , set of states TOutput: ε-closure

Push all states in T onto stackInitialize ε-closure(T ) to Twhile stack 6= ∅ do

Pop t, the top element of stackfor each state u with an edge from t to u labeled ε do

if u 6∈ ε-closure(T ) thenAdd u to ε-closure(T )Push u onto stack

endend

end

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 50 / 309

Example NFA→ DFA

0 1

2 3

4 5

6 7 8 9 10

ε

ε

ε

ε

ε

ε

ε

ε

a

b

a b b

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 51 / 309

Thompson’s ConstructionInput:

Regular expression r over an alphabet ΣOutput: NFA N , which accepts L(r)

ε ε

i f st fi

N(s)

N(t)

a ∈ Σ i fa s∗ i

N(s)

f

ε

ε

ε

ε

s|t i f

N(s)

N(t)

ε

ε

ε

ε

[s]

whereby [s] = (s | ε)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 52 / 309

NFA simulation

Input: NFA N , input string xOutput: ’Yes’ if N accepts x, ’No’ otherwise

S := ε-closure({s0})a := nextCharwhile a 6= eof do

S := ε-closure(move(S,a))a := nextChar

endif S ∩ F 6= ∅ then

return ’Yes’else

return ’No’end

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 53 / 309

Summary

Check whether an input string x is contained in a languagedefined by the regular expression r:

1 Construct NFA (Thompson’s construction) and apply simulationalgorithm for NFAs

2 Construct NFA (Thompson’s construction), convert NFA to DFA andapply simulation algorithm for DFAs

Time-space tradeoffs

Automaton Space TimeNFA O(|r|) O(|r| · |x|)DFA O(2|r|) O(|x|)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 54 / 309

Regular expression→ DFA

Direct conversionNotation:

NFA state s is important↔ s has at least one non-ε-move to asuccessorExtended regular expression (Augmented regular expression) (r)#Illustration of extended expressions via syntax trees (cat-node,or-node, star-node)Each leaf node is labeled either with a symbol (of the alphabet) orwith εEach leaf node (6= ε) has a designated number (position)

Remark: NFA-DFA-conversion only via important states =positions

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 55 / 309

Syntax tree – Example

*

|

a b

b

#

b

a

1 2

3

4

5

6

Syntax tree for (a|b)∗abb#

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 56 / 309

Algorithm - Idea

Approach:1 Create syntax tree for extended regular expression2 Calculate four functions: nullable, firstpos, lastpos, followpos3 Construct DFA using followpos

DFA states correspond to sets of positions (important NFA states)Position i, followpos(i) are positions j for which there exists aninput string . . . cd . . . where i corresponds to c and j correspondsto d

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 57 / 309

followpos – Example

Syntax tree for (a|b)∗abb#followpos(1) = {1, 2, 3}Explanation: Suppose we see an a. This symbol could belongeither to (a|b)∗ or to the following a. If we see a b then this symbolhas to belong to (a|b)∗. Thus positions 1,2,3 are contained infollowpos(1).Informal definition of functions (node n, string s)

firstpos . . . positions, which match the first symbol of slastpos . . . positions, which match the last symbol of snullable . . . True, if node n can create a language containing theempty string, false otherwise

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 58 / 309

Rules for nullable, firstpos, lastpos

Node n nullable firstpos [lastpos]Leaf n labeledε

true ∅ [∅]Leaf n labeledposition i false {i} [{i}]

c1|c2 nullable(c1)∨nullable(c2) firstpos(c1) ∪ firstpos(c2)[lastpos(c1) ∪ lastpos(c2)]

c1 • c2 nullable(c1)∧nullable(c2)

if nullable(c1) thenfirstpos(c1) ∪firstpos(c2)else firstpos(c1)if nullable(c2) thenlastpos(c1) ∪ lastpos(c2)else lastpos(c2)

c∗1 true firstpos(c1) [lastpos(c1)]

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 59 / 309

firstpos, followpos – Example

*

|

a b

b

#

b

a

1 2

3

4

5

6

{1} {2}

{3}

{4}

{5}

{6}

{1,2,3}

{1,2}

{1} {2}

{3}

{4}

{5}

{6}

{6}

{5}

{4}

{3}

{1,2,3}

{1,2,3}

{1,2,3}

{1,2}

{1,2} {1,2}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 60 / 309

Calculation followpos

1 If n is a cat-node (•), c1 is the left, c2 is the right child and i is aposition in lastpos(c1), then all positions of firstpos(c2) arecontained in followpos(i)

2 If n is a star-node (∗) and i is contained in lastpos(n), then allpositions of firstpos(c) are contained in followpos(i)

Node followpos1 {1,2,3}2 {1,2,3}3 {4}4 {5}5 {6}6 -

1

2

3 4 5 6

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 61 / 309

followpos-graph→ NFA

A NFA without ε-moves can be created based on afollowpos-graph:

1 Mark all positions in firstpos of the root node of the syntax tree asinitial states

2 Label each directed edge (i, j) with the symbol of position i3 Mark position which belongs to # as end state

⇒ followpos-graph can be converted to DFA (using subsetconstruction)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 62 / 309

DFA-algorithmInput:

Regular expression rOutput: DFA D, accepting the language L(r)

Initially, the only unmarked state in Dstates is firstpos(root)while there is an unmarked state T ∈ Dstates do

Mark Tfor each input symbol a do

Let U be the set of positions that are in followpos(p)for some position p ∈ T where the symbol of p is a.

if U 6= ∅ and U 6∈ Dstates thenAdd U as unmarked state to Dstates

endDTran(T, a) = U

endend

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 63 / 309

PART 3 - SYNTAX ANALYSIS

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 64 / 309

Goals

Definition of the syntax of a programming language using contextfree grammarsMethods for parsing of programs⇒ determine whether a program is syntactically correctAdvantages (of grammars):

Precise, easily comprehensible language definitionAutomatic construction of parsersDeclaration of the structure of a programming language (importantfor translation and error detection)Easy language extensions and modifications

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 65 / 309

Tasks

get next token

token

source program analyser

lexicalparser

symbol

table

parse tree rest of the

front end

intermediate

representation

Parser types:Universal parsers(inefficient)Top-down-parserBottom-up-parser

Only subclasses ofgrammars (LL, LR)Collect tokeninformationsType checkingImmediate codegeneration

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 66 / 309

Syntax error handling

Error types:Lexical errors (spelling of a keyword)Syntactic errors (closing bracket is missing)Semantic errors (operand is incompatible to operator)Logic Errors (infinite loop)

Tasks:Exact error descriptionError recovery→ consecutive errors should be detectableError correction should not slow down the processing of correctprograms

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 67 / 309

Problems during error handling

Spurious Errors: Consecutive errors created by error recoveryExample: Compiler issues error-recovery resulting in the removalof the declaration of pi→ Error during semantic analysis: pi undefinedError is detected late in the process→ error message does notpoint to the correct position within the codeToo many error messages are issued

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 68 / 309

Error-recovery

Panic mode:Skip symbols until input can by synchronized to a tokenPhrase-level recovery:Local error corrections, e.g. replacement of ’,’ by a ’;’Error productions:Extension of grammar to handle common errorsGlobal correction:Minimal correction of program in order to find a matchingderivation (cost intensive)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 69 / 309

GrammarsGrammar

A grammar is a 4-tupel G = (VN , VT , S,Φ) whereby:VN Set of nonterminal symbolsVT Set of terminal symbolsS ∈ VN Start symbolΦ : (VN ∪ VT )∗VN (VN ∪ VT )∗ 7→ (VN ∪ VT )∗

Set of production rules (rewriting rules)(α, β) is represented as α→ β

Example:({S,A,Z}, {a, b, 1, 2}, S, {S → AZ,A→ a,A→ b, Z → ε, Z → 1, Z →2, Z → ZZ})

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 70 / 309

Derivations in grammars

Direct derivation σ, ψ ∈ (VT ∪ VN )∗.σ can be directly derived from ψ (in one step; ψ ⇒ σ), if there are twostrings φ1, φ2, so that σ = φ1βφ2 and ψ = φ1αφ2 and α→ β ∈ Φ.

Example:ψ σ Rule used φ1 φ2S A Z S → AZ ε εaZ a1 Z → 1 a εAZZ A2Z Z → 2 A Z

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 71 / 309

DerivationProduction:

A string ψ produces σ (ψ +⇒ σ), if there are strings φ0, . . . , φn (n > 0),so that ψ = φ0 ⇒ φ1, φ1 ⇒ φ2, . . . , φn−1 ⇒ φn = σ.

Example: S ⇒ AZ ⇒ AZZ ⇒ A2Z ⇒ a2Z ⇒ a21

Reflexive, transitive closure: ψ ∗⇒ σ ⇔ ψ+⇒ σ or ψ = σ

Accepted language: A grammar G accepts the following language

L(G) = {σ|S ∗⇒ σ, σ ∈ (VT )∗}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 72 / 309

Parse treesExample:

E → E + E|E ∗ E|id⇒ 2 derivations (and parse trees) for id+id*id

E

+E E

id E E

id id

* +E E

E E*

E

id

id id

Grammar is ambiguous

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 73 / 309

Classification of grammars

Chomsky (restriction of production rules α→ β)

Unrestricted Grammar: no restrictionsContext-Sensitive Grammar: |α| ≤ |β|Context-Free Grammar: |α| ≤ |β| and α ∈ VNRegular Grammar: |α| ≤ |β|, α ∈ VN and β is in the form of: aB ora whereby a ∈ VT and B ∈ VN

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 74 / 309

Grammar examples

Regular grammar: (a|b)∗abb

A0 → aA0|bA0|aA1

A1 → bA2

A2 → bA3

A3 → ε

Context-sensitive grammars:

L1 = {wcw|w ∈ (a|b)∗}ButL′1 = {wcwR|w ∈ (a|b)∗} iscontext-freeL2 = {anbmcndm|n ≥1,m ≥ 1}L3 = {anbncn|n ≥ 1}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 75 / 309

Conversions

Remove ambiguitiesstmnt→ if expr then stmnt|if expr then stmnt else stmnt| other2 parse trees for if E1 then if E2 then S1 else S2.

smtn

if expr then smtn

if expr then smtn else smtn

E1

E2 S1 S2 S1

if expr then smtn else smtn

smtn

S2

if expr then smtn

E1

E2

Prefer left treeAssociate each else with the closest preceding then

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 76 / 309

Removing left recursions

A grammar is left-recursive if there is a nonterminal A and aproduction A +⇒ Aα

Top-Down-Parsing can’t handle left recursionsExample:convert A→ Aα|β to:A→ βA1

A1 → αA1|ε

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 77 / 309

Algorithm to eliminate left recursionsInput:

Grammar G without cycles and ε-productionsOutput: Grammar without left recursions

Arrange the nonterminals in some order A1, A2, . . . , Anfor i := 1 to n do

for j := 1 to i− 1 doReplace each production Ai → Ajγ

by the productions Ai → δ1γ| . . . |δkγ,where Aj → δ1| . . . |δk are all the current Aj-productions

endEliminate the immediate left recursion among the Ai-productions

end

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 78 / 309

Left factoring

Important for predictive parsingElimination of alternative productions

Example:stmnt→ if expr then stmnt else stmnt

| if expr then stmnt

Solution: For each nonterminal A find the longest prefix α for twoor more alternative productionsIf α 6= ε then replace all A-productions A→ αβ1|αβ2| . . . |αβn|γ (γdoes not start with α) with:A→ αA1|γA1 → β1|β2| . . . |βn

Apply transformation until no prefixes α 6= ε can be found

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 79 / 309

Top-down-parsing

Idea: Construct parse tree for a given input, starting at root nodeRecursive-descent parsing (with backtracking)

Example:S → cAdA→ ab|a

Matching of cad

S

c dA

S

c dA

S

c dA

a ba

(2)(1)(3)

Predictive parsing (without backtracking, special case ofrecursive-descent parsing)Left-recursive grammars can lead to infinite loops!

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 80 / 309

Predictive parsers

Recursive-descent parser without backtrackingPossible if production which needs to be used is obvious for eachinput symbolTransition diagrams

1 Remove left recursions2 Left factoring3 For each nonterminal A:

1 Create a initial state and an end state2 For each production A→ X1X2 . . . Xn create a path leading from the

initial state to the end state while labeling the edges X1, . . . , Xn

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 81 / 309

Predictive parsers (II)

Processing:1 Start at the initial state of the current start symbol2 Suppose we are currently in the state s which has an edge whose

label contains a terminal a and leads to the state t. If the next inputsymbol is a then go to state t and read a new input symbol.

3 Suppose the edge (from s) is marked by a nonterminal A. In thatcase go to the initial state of A (without reading a new inputsymbol). If we reach the end state of A then go to state t which issucceeding s.

4 If the edge is marked by ε then go directly to t without reading theinput.

Easily implemented by recursive procedures

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 82 / 309

Example - Predictive parser

E → idE1|(E)E1 → opE|ε

E()if nextToken=id then

getNextTokenE1()

if nextToken=( thengetNextTokenE()if nextToken=) then accept

E1()if nextToken=op then

getNextTokenE()

else return

id

E )

0 1

2

(

E1

3 4

0 1 2op E

ε

E1:

E:

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 83 / 309

Non-recursive predictive parser

Predictive Parsing

Program

Parsing Table M

INPUT

STACK

OUTPUTX

Y

Z

$

a + b $

Input buffer: String to be parsed (terminated by a $)

Stack: Initialized with the start symbol and contains nonterminals whichare not derivated yet (terminated by a $)

Parsing table M(A, a), A is a nonterminal, a a terminal or $

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 84 / 309

Top-down parsing with stack

Mode of operation:X is top element of stack, a the current input symbol

1 X is a terminal: If X = a = $, then the input was matched. IfX = a 6= $, pop X off the stack and read next input symbol.Otherwise an error occured.

2 X ist a nonterminal: Fetch entry of M(X, a). If this entry is an errorskip to error recovery. Otherwise the entry is a production of theform X → UVW . Replace X on the stack with WV U (afterward Uis the top most element on the stack).

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 85 / 309

Example

Grammar

E → id E1|(E)E1 → op E|ε

Parsing table M(X, a)NONTERMINAL id op ( ) $

E E → id E1 E → (E)E1 E1 → op E E1 → ε E1 → ε

Derivation of id op id.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 86 / 309

Example (II)

STACK INPUT OUTPUT$ E id op id $$ E1 id id op id $ E → id E1

$ E1 op id $$ E op op id $ E1 → op E$ E id $$ E1 id id $ E → id E1

$ E1 $$ $ E1 → ε

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 87 / 309

FIRST and FOLLOW

Used when calculating parse tableFIRST (α) Set of terminals, which can be derived from α (α stringof grammar symbols)FOLLOW (A) Set of terminals which occur directly on the rightside next to the nonterminal A in a derivationIf A is the right most element of a derivation, then $ is contained inFOLLOW (A)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 88 / 309

Calculation of FIRST

FIRST (X) for a grammar symbol X1 X is a terminal: FIRST (X) = {X}2 X → ε is a production: Add ε to FIRST (X)3 X is a nonterminal and X → Y1Y2 . . . Yk is a productiona is in FIRST (X) if:

1 An i exists; a is in FIRST (Yi) and ε is in every setFIRST (Y1) . . . F IRST (Yi−1)

2 a = ε and ε is in every set FIRST (Y1) . . . F IRST (Yk)

FIRST (X1X2 . . . Xn):Each non-ε symbol of FIRST (X1) is in the resultIf ε ∈ FIRST (X1), then each non-ε symbol of FIRST (X2) is inthe result and so onIs ε in every FIRST -set, then it it also is contained in the result

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 89 / 309

Calculation of FOLLOW

In order to calculate FOLLOW (A) of a nonterminal A use followingrules:

1 Add $ to FOLLOW (S), whereby S is the initial symbol2 For each production A→ αBβ, add all elements of FIRST (β)

except ε to FOLLOW (B)

3 For each production A→ αB and A→ αBβ with ε ∈ FIRST (β),add each element of FOLLOW (A) to FOLLOW (B)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 90 / 309

Example

Grammar:

E → id E1|(E)E1 → op E|ε

FIRST sets:

FIRST (E) = {id, (}FIRST (E1) = {op, ε}

FOLLOW sets:

FOLLOW (E) = FOLLOW (E1) = {$, )}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 91 / 309

Construction of parsing tablesInput:

Grammar GOutput: Parsing table M

1 For each production A→ α do Steps 2 and 3.2 For each terminal a in FIRST (α), add A→ α to M(A, a).3 If ε is in FIRST (α), add A→ α to M(A, b) for each terminal b inFOLLOW (A). If ε is in FIRST (α) and $ is in FOLLOW (A), addA→ α to M(A, $)

4 Make each undefined entry of M be error.

Example: See table of last example grammar

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 92 / 309

LL(1) Grammars

Parsing table construction can be used with arbitrary grammarsMultiple elements per entry may occurLL(1) Grammar: Grammar whose parsing table contains nomultiple entriesL . . . Scanning the Input from LEFT to rightL . . . Producing the LEFTMOST derivation1 . . . Using 1 input symbol lookahead

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 93 / 309

Properties of LL(1)

No ambiguous or left-recursive grammar is LL(1)G ist LL(1)⇔ For each two different productions A→ α|β it isneccessary that:

1 No strings may be derived from both α and β which start with thesame terminal a

2 At most one of the productions α or β may be derivable to ε3 If β ∗⇒ ε, then α may not derive any string which starts with an

element in FOLLOW (A)

Multiple entries in the parsing table can occasionally be removedby hand (without changing the language recognized by theautomaton)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 94 / 309

Error-recovery in predictive parsing

Heuristics in panic-mode error recovery:1 Initially, all symbols in FOLLOW (A) can be used for

synchronization: Skip all tokens until an element in FOLLOW (A) isread and remove A from the stack.

2 If FOLLOW sets don’t suffice: Use hierarchical structure ofprogram constructs. E.g. use keywords occurring at the beginningof a statement as addition to the synchronization set.

3 FIRST (A) can be used as well: If an element in FIRST (A) isread, continue parsing at A.

4 If a terminal which can’t be matched is at the top of the stack,remove it.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 95 / 309

Bottom-up parsing

Shift-reduce parsingReduction of an input towards the start symbol of the grammarReduction step:Replace a substring, which matches the right side of a productionwith the left side of that same productionExample:

S → aABeA→ Abc|bB → d

abbcdeaAbcdeaAdeaABeS

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 96 / 309

Handles

Substring, which matches the right side of a production and leadsto a valid derivation (rightmost derivation)Example (ambiguous grammar):

E → E + EE → E ∗ EE → (E)E → id

Rightmost derivation of id + id * id:Right-Sentential Form Handle Reducing Production

id + id * id id E → idid + id * E id E → idid + E * E E ∗ E E → E ∗ E

id + E id E → idE + E E + E E → E + E

E

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 97 / 309

Stack implementation

Initially:Stack Input$ w$

Shift n ≥ 0 symbols from input onto stack until a handle can befoundReduce handle (replace handle with left side of production)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 98 / 309

Example shift-reduce parsing

Stack Input Action(1) $ id + id * id $ shift(2) $ id + id * id $ reduce by E → id(3) $ E + id * id $ shift(4) $ E+ id * id $ shift(5) $ E+ id * id $ reduce by E → id(6) $ E + E * id $ shift(7) $ E + E∗ id $ shift(8) $ E + E∗ id $ reduce by E → id(9) $ E + E ∗ E $ reduce by E → E ∗ E

(10) $ E + E $ reduce by E → E + E(11) $ E $ accept

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 99 / 309

Viable prefixes, conflicts

Viable prefix: Right sentential forms which can occur within thestack of a shift-reduce parserConflicts: (Ambiguous grammars)

stmt→ if expr then stmt| if expr then stmt else stmt| other

Configuration:Stack Input. . . if expr then stmt else. . .

No unambiguous handle (shift-reduce conflict)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 100 / 309

LR parser

LR(k) parsingL . . . Left-to-right scanningR . . . Rightmost derivation in reverseAdvantages:

Can be used for (nearly) every programming language constructMost generic backtrack-free shift-reduce parsing methodClass of LR-grammars is greater than those of LL-grammarsLR-parsers identify errors as early as possible

Disadvantage: LR-parser is hard to build manually

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 101 / 309

LR-parsing algorithm

a a $a

s

X

s

X

s0

m-1

m-1

m

m

1 i n..... .....

.....

Parsing Program

LR

action goto

INPUT

OUTPUT

STACK

Stack stores s0X1s1X2s2 . . . Xmsm (Xi grammar, si state)

Parsing table = action- and goto-table

sm current state, ai current input symbolaction[sm, ai] ∈ {shift, reduce, accept, error}goto[sm, ai] transition function of a DFA

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 102 / 309

LR-parsing mode of operation

Configuration (s0X1s1 . . . Xmsm, aiai+1 . . . an)

Next step (move) is determined by reading of aiDependent on action[sm, ai]:

1 action[sm, ai] = shift sNew configuration: (s0X1s1 . . . Xmsmais, ai+1 . . . an)

2 action[sm, ai] = reduce A→ βNew configuration: (s0X1s1 . . . Xm−rsm−rAs, aiai+1 . . . an)whereby s = goto[sm−r, A], r length of β

3 action[sm, ai] = accept4 action[sm, ai] = error

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 103 / 309

Example

(1) E → E + T(2) E → T(3) T → T ∗ F(4) T → F(5) F → (E)(6) F → id

State action gotoid + * ( ) $ E T F

0 s5 s4 1 2 31 s6 acc2 r2 s7 r2 r23 r4 r4 r4 r44 s5 s4 8 2 35 r6 r6 r6 r66 s5 s4 9 37 s5 s4 108 s6 s119 r1 s7 r1 r1

10 r3 r3 r3 r311 r5 r5 r5 r5

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 104 / 309

Construction of SLR parsing tables

LR(0)-items: Production with dot at one position of the right sideExample: Production A→ XY Z has 4 items: A→ .XY Z,A→ X.Y Z, A→ XY.Z and A→ XY Z.Exception: Produktion A→ ε only has the item: A→ .

Augmented grammar: Grammar with new start symbol S′ andproduction S′ → S.Functions: closure and gotoclosure(I) (I . . . set of items)

1 All I are within closure2 If A→ α.Bβ is part of closure and B → γ is a production, then addB → .γ to closure

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 105 / 309

Construction, goto

goto(I,X) with I as set of items and X a symbol of the grammar

goto = closure of set of all items A→ αX.β for all A→ α.Xβ in I

Example: I = {E′ → E.,E → E.+ T}goto(I,+) = {E → E + .T, T → .T ∗ F, T → .F, F → .(E), F → .id}Sets-of-items construction (Construction of all LR(0)-items)items(G′)

I0 = closure({S′ → .S})C = {I0}repeat

for each set of items I ∈ C and each grammar symbol Xsuch that goto(I,X) is not empty and not in C do

Add goto(I,X) to Cuntil no more sets of items can be added to C

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 106 / 309

SLR parsing table

Input: Augmented grammar G′

Output: SLR parsing table

1 Calculate C = {I0, I1, . . . , In}, the set of LR(0)-items of G′

2 State i is created by Ii as follows:

1 If A→ α.aβ is in Ii and goto(Ii, a) = Ij , then action(i, a) = shift j (ais a terminal symbol)

2 If A→ α. is in Ii, then action[i, a] = reduce A→ α for alla ∈ FOLLOW (A) ∧A 6= S′

3 If S′ → S. is in Ii, then action[i, $] = accept

3 For all nonterminal symbols A: goto[i, A] = j if goto(Ii, A) = Ij

4 Every other table entry is set to error

5 Initial state is determined by the item set with S′ → .S

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 107 / 309

SLR(1), conflicts, error handling

If we recieve a table without multiple entries using theSLR-parsing-table-algorithm then the grammar is SLR(1)Otherwise the algorithm fails and an algorithm for extendedlanguages (like LR) needs to be utilized⇒ generally results in increased processing requirementsShift/reduce-conflicts can be partially resolvedThe process usually involves the determination of operator bindingstrength and associativityError handling can be directly incorporated into the parsing table

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 108 / 309

PART 4 - SYNTAX DIRECTED TRANSLATION

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 109 / 309

Setting

Translation of context-free languagesInformation↔ attributes of grammar symbolsValues of attributes are defined by “semantic rules”2 possibilities:

Syntax directed definitions (high-level spec)Translation schemes (implementation details)

Evaluation: (1) Parse input, (2) Generate parse tree, (3) Evaluateparse tree

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 110 / 309

Syntax directed definitions

Generalization of context-fee grammars

Each grammar symbol has a set of attributes

Synthesized vs. inherited attributes

Attribute: string, number, type, memory location, . . .

Value of attribute is defined by semantic rules

Synthesized: Value of child node in parse treeInherited: Value of parent node in parse tree

Semantic rules define dependencies between attributes

Dependency graph defines calculation order of semantic rules

Semantic rules can have side effects

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 111 / 309

Form of a syntax directed definition

Grammar production: A→ α

Associated semantic rule: b := f(c1, . . . , ck)

f is a functionSynthesized: b is a synthesized attribute of A and c1, . . . , ck aregrammar symbols of the productionInherited: b is an inherited attribute of a grammar symbol on theright side of the production and c1, . . . , ck are grammar symbols ofthe productionb depends on c1, . . . , ck

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 112 / 309

Example

“Calculator”-program: val is a synthesized attribute fornonterminals E, T and F

Production Semantic RuleL→ En print(E.val)E → E1+T E.val := E1.val + T.valE → T E.val := T.valT → T1*F T.val := T1.val ∗ F.valT → F T.val := F.valF → (E) F.val := E.valF → digit F.val := digit.lexval

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 113 / 309

S-attributed grammar

Attributed grammar exclusively using synthesized attributesExample-evaluation: 3*5+4n (annotated parse tree)

L

E.val=19 n

T.val=4

digit

F.val=4

.lexval=4

E.val=15+

T.val=15

*T.val=3

digit.lexval=3

F.val=3

F.val=5

digit.lexval=5

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 114 / 309

Inherited attributes

Definition of dependencies of program language constructs andtheir contextExample: (type checking)

Production Semantic RuleD → TL L.in := T.typeT → int T.type := integerT → real T.type := realL→ L1, id L1.in := L.in

addtype(id.entry, L.in)L→ id addtype(id.entry, L.in)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 115 / 309

Inherited attributes – Annotated parse tree

real id1, id2, id3

D

T.type=real

real

L.in=real

L.in=real

L.in=real

id

id

1

2

id3

,

,

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 116 / 309

Dependency graphs

Show dependencies between attributesEach rule is represented in the form b := f(c1, . . . , ck)

Nodes correspond to attributes; edges to dependenciesDefinition:for each node n in the parse tree do

for each attribute a of the grammar symbol at n doconstruct a node in the dependency graph for a

for each node n in the parse tree dofor each semantic rule b := f(c1, . . . , ck) associated

with the production used at n dofor i := 1 to k do

construct an edge from the node for ci to the node for b

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 117 / 309

Dependency graph – Example

D

real

id

id

1

2

id3

,

,entry

entry

entry

T L

L

Lin

in

intype

1

2

3

4

56

7

9

10

8

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 118 / 309

Topological sort

Arrangement of m1, . . . ,mk nodes in a directed, acyclic graph whereedges point from smaller nodes to bigger nodesIf mi → mj is an edge, then the node mi is smaller than the node mj

Important for order in which the attributes are calculated

Example (cont.):1 a4 := real2 a5 := a43 addtype(id3.entry, a5)4 a7 := a55 addtype(id2.entry, a7)6 a9 := a77 addtype(id1.entry, a9)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 119 / 309

Example - syntax trees

Abstract syntax tree = simplified form of a parse treeOperators and keywords are supplied to intermediate nodes byleaf nodesProductions with only one element can collapseExamples:

if-then-else

B S S1 2

3 5

4

+

*

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 120 / 309

Syntax trees – Expressions

Functions (return value: pointer to new node):

mknode(op, left, right): node label op, 2 child nodes left, rightmkleaf(id, entry): leaf id, entry in symbol table entrymkleaf(num, val): leaf num, value val

Syntax directed definition:Production Semantic RuleE → E1 + T E.nptr := mknode(′+′, E1.nptr, T.nptr)E → E1 − T E.nptr := mknode(′−′, E1.nptr, T.nptr)E → T E.nptr := T.nptrT → (E) T.nptr := E.nptrT → id T.nptr := mkleaf(id, id.entry)T → num T.nptr := mkleaf(num,num.val)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 121 / 309

Syntax trees – Expressions (ex.)

Syntax tree for a-4+cE nptr

T nptrE nptr

T nptr-

+

E

T nptr

id

num

id

id num

’-’

’+’

id

4

to entry for a

to entry for c

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 122 / 309

Evaluation of S-attributed definitions

Attributed definition exclusively using synthesized attributesEvaluation using bottom-up parser (LR-parser)Idea: store attribute information on stack

State Val. . . . . .X X.x

Y Y.y

top→ Z Z.z

. . . . . .

Semantic rule:A.a := f(X.x, Y.y, Z.z)Production: A→ XY ZBefore XY Z is reduced to A, valueof Z.z stored in val[top], Y.y storedin val[top− 1], X.x in val[top− 2]

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 123 / 309

Example - S-attributed evaluation

“Calculator”-example:

Production Code FragmentL→ En print(val[top− 1])E → E1 + T val[ntop] := val[top− 2] + val[top]E → TT → T1 ∗ F val[ntop] := val[top− 2] ∗ val[top]T → FF → (E) val[ntop] := val[top− 1]F → digit

Code executed before reduction

ntop = top− r + 1, after reduction: top := ntop

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 124 / 309

Result for 3*5+4n

Input state val Production used3*5+4n

*5+4n 3 3*5+4n F 3 F → digit*5+4n T 3 T → F5+4n T * 3

+4n T * 5 3 5+4n T * F 3 5 F → digit+4n T 15 T → T ∗ F+4n E 15 E → T

4n E + 15n E + 4 15 4n E + F 15 4 F → digitn E + T 15 4 T → Fn E 19 E → E + T

E n 19L 19 L→ En

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 125 / 309

L-attributed definitions

Definition: A syntax directed definition is L-attributed if each inheritedattribute of Xj , 1 ≤ j ≤ n, on the right side of A→ X1, . . . , Xn is onlydependent on:

1 the attributes X1, . . . , Xj−1 to the left of Xj and2 the inherited attributes of A

Each S-attributed grammar is a L-attributed grammar

Evaluation using depth-first orderprocedure dfvisit(n : node)

for each child m of n, from left to right doevaluate inherited attributes of mdfvisit(m)

endevaluate synthesized attributes of n

end

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 126 / 309

Translation schemes

Translation scheme = context-free language with attributes forgrammar symbols and semantic actions which are placed on theright side of a production between grammar symbols and areconfined within {}Example:

T → T1 ∗ F{T.val := T1.val ∗ F.val}If only synthesized attributes are used, the action is always placedat the end of the right side of a productionNote: Actions may not access attributes which are not calculatedyet (limits positions of semantic actions)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 127 / 309

Translation schemes (cont.)

If both inherited and synthesized attributes are used the followingneeds to be taken into consideration:

1 An inherited attribute of a symbol on the right side of a productionhas to be calculated in an action which is positioned to the left ofthe symbol

2 An action may not reference a synthesized attribute belonging to asymbol which is positioned to the right of the action

3 A synthesized attribute of a nonterminal on the left side can only becalculated if all referenced attributes have already been calculated⇒ actions like these are usually placed at the end of the right side

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 128 / 309

Example translation scheme

S → A1A2 {A1.in := 1;A2.in := 2}A→ a {print(A.in)}

Above grammar does not fulfill the three conditions for translationschemesThe inherited attribute A.in is not yet defined at the point in timewhen it should be printedBut: For each L-attributed grammar a translation scheme can befound which fulfills the three conditions, e.g.:

S → {A1.in := 1} A1 {A2.in := 2} A2

A→ a {print(A.in)}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 129 / 309

Top-down translation

Removal of left recursions in translation scheme is necessary

Example:

E → E1 + T {E.val := E1.val + T.val}E → E1 − T {E.val := E1.val − T.val}

E → T {E.val := T.val}T → (E) {T.val := E.val}T → num {T.val := num.val}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 130 / 309

Example top-down translation

E → T {R.i := T.val}R {E.val := R.s}

R→ +T {R1.i := R.i+ T.val}R1 {R.s := R1.s}

R→ −T {R1.i := R.i− T.val}R1 {R.s := R1.s}

R→ ε {R.s := R.i}T → (

E) {T.val := E.val}

T → num {T.val := num.val}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 131 / 309

Evaluation of 9-5+2

E

-

T.val = 9 R.i = 9

T.val = 5 R.i = 4.val = 9num

.val = 5num + T.val = 2 R.i = 6

.val = 2num ε

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 132 / 309

Summary transformation

Given translation scheme:A→ A1Y {A.a := g(A1.a, Y.y)}A→ X {A.a := f(X.x)}

After removal of left recursions:A→ XRR→ Y R|ε

Transformed scheme:A→ X {R.i := f(X.x)}

R {A.a := R.s}R→ Y {R1.i := g(R.i, Y.y)}

R1 {R.s := R1.s}R→ ε {R.s := R.i}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 133 / 309

Predictive parsing with schemesInput: syntax-directed translation scheme; Outp.: Syntax-directed translator

1 For each nonterminal A, construct a function that has a formal parameterfor each inherited attribute of A and that returns the values of thesynthesized attributes of A. This function has a local variable for eachattribute of each grammar symbol that appears in a production for A.

2 As previously described (see predictive parsing), the code fornonterminal A decides what production to use based on the currentinput symbol.

3 The code for each production does the following (evaluation from left toright):

1 Token X with synthesized attribute x: Save the value of x in avariable X.x. Generate a call to match token X.

2 Nonterminal B: Generate c := B(b1, . . . , bk); b1, . . . , bk variables forinherited attributes of B; c variable for synthesized attribute of B.

3 For an action, copy the code into the parser, replacing eachreference to an attribute by the variable for that attribute.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 134 / 309

Example - predictive parsing

Grammar:E → T {R.i := T.val}

R {E.val := R.s}R→ op

T {R1.i := mknode(op.lexeme,R.i, T.nptr)}R1 {R.s := R1.s}

R→ ε {R.s := R.i}T → (

E) {T.val := E.val}

T → num {T.val := num.val}

Functions:function E : nodefunction R(i : node) : nodefunction T : node

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 135 / 309

Parsing procedure R

Procedure without translation scheme

procedure R()begin

if lookahead = op then beginmatch(op);T ();return R()

end else beginreturn;

endend

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 136 / 309

Parsing function R

function R (i: node) : nodevar nptr, i1, s1, s: node; oplexeme : char;

beginif lookahead = op then begin

oplexeme := lexval;match(op);nptr := T ();i1 := mknode(oplexeme,i,nptr);s1 := R(i1);s := s1

end elses := i;

return send

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 137 / 309

Bottom-up with inherited attribute

Implementation of L-attributed grammars in bottom-up parsersFor LL(1)-grammars and many LR(1)-grammarsRemoval of embedding actions from translation schemes:

Actions have to be placed at end of right side of a productionEnsured by new marker nonterminals

Example:E → TRR→ +T{print(′+′)}R| − T{print(′−′)}R|εT → num{print(num.val)}

E → TRR→ +TMR| − TNR|εT → num{print(num.val)}M → ε{print(′+′)}N → ε{print(′−′)}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 138 / 309

Inherited attributes on the stack

Idea: Production A→ XY , synthesized attribute X.x and inheritedattribute Y.y

Before a reduction (of X Y ), X.x is on the stackIn the case of Y.y = X.x (copy action), the value of X.x can beused whenever the value of Y.y is required

Example: Parser for variable declarations real p,q,r

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 139 / 309

Variable declaration - example

D → T {L.in := T.type}L

T → int {T.type := integer}T → real {T.type := real}L→ L1 {L1.in := L.in}

,id {addtype(id.entry, L.in)}

L→ id {addtype(id.entry, L.in)}

D

T

in

Lreal

Ltype

r,

L

in

in

p

q,

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 140 / 309

Calculation using the stack

Input state Production Usedreal p,q,r

p,q,r realp,q,r T T → real,q,r T p,q,r TL L→ idq,r TL ,

,r TL , q,r TL L→ L, idr TL ,

TL , rTL L→ L, idD D → TL

Implementation:Production Code FragmentD → TLT → int val[top] := integerT → real val[top] := realL→ L, id addtype(val[top], val[top− 3])L→ id addtype(val[top], val[top− 1])

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 141 / 309

Problems

Positions of attributes on the stack need to be known

Production Semantic RuleS → aAC C.i := A.sS → aABC C.i := A.sC → c C.s := g(C.i)

When the reduction C → cis conducted, it is unknownwhether the value of C.i islocated in val[top − 1] or inval[top − 2]! It depends onwhether a B is located on thestack.

Solution: Introduction of a marker M :S → aAC C.i := A.sS → aABMC M.i := A.s;C.i := M.sC → c C.s := g(C.i)M → ε M.s := M.i

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 142 / 309

Problems (cont.)

Simulation of semantic rules which are no copy actionsUsage of marker!

S → aAC C.i := f(A.s)S → aANC N.i := A.s;C.i := N.sN → ε N.s := f(N.i)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 143 / 309

Bottom-up parsing . . .

. . . with calculation of inherited attributesInput: L-attributed definition (and LL(1)-grammar)Output: Parser, which calculates attribute values on stack

1 Assumptions: Each nonterminal A has an inherited attribute A.i, eachgrammar symbol X has a synthesized attribute X.s. If X is a terminal,then X.s is the lexical value of X (supplied by the lexical analyser). Thevalues are stored on the stack in form of an array val.

2 For each production A→ X1 . . . Xn create n new markers (nonterminals)M1, . . . ,Mn and replace the production with A→M1X1 . . .MnXn.Note: synthesized values for Xi are stored in the val array entry, whichbelongs to Xi. Inherited values Xi.i are stored in entries which areassociated to Mi.

3 Invariant: The new inherited attribute A.i (if existing) is always directlybeneath the position of M1 within the val array.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 144 / 309

Simplifications

Reduction of markers:1 If Xj has no inherited attribute, then no marker Mj is required⇒ positions of attributes on the stack are shifting!

2 If X1.i exists and is calculated by X1.i = A.i, then M1 is notrequired

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 145 / 309

Removal of inherited attributes

Replacement of inherited attributes by synthesized onesNot always possibleRequires modification of grammar!Example: Declarations in PascalD → L : TT → integer|charL→ L, id|id

convert to:D → idLL→, idL| : TT → integer|char

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 146 / 309

Difficult syntax directed definition

The following definition cannot be processed by bottom-up parsersusing current approachesS → L L.count := 0L→ L11 L1.count := L.count+ 1L→ ε print(L.count)

Reason: L→ ε receives the number of 1s by means of inheritanceHowever, as L→ ε is used in the reduction first, no value is specifiedyet!

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 147 / 309

Recursive evaluators

Evaluation of attributesBased on parse treeNot possible in conjunction with parsingOrder of nodes which are visited during evaluation is arbitraryFor each nonterminal a translation function existsExtensions may visit nodes more than onceOrder of node visits needs to regard the following:

1 Each inherited attribute of a node has to be calculated before thenode is visited

2 Synthesized attributes are calculated before the node is left (for thelast time)

Order is determined by dependencies

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 148 / 309

Example – Recursive evaluators

Production Semantic RulesA→ LM L.i := l(A.i)

M.i := m(L.s)A.s := f(M.s)

A→ QR R.i := r(A.i)Q.i := q(R.s)A.s := f(Q.s)

A

L M

A

Q R

i s

i i i

i

ss s s

s

i

function A(n, ai)if production(n) = ′A→ LM ′ then

li := l(ai)ls := L(child(n, 1), li)mi := m(ls)ms := M(child(n, 2),mi)return f(ms)

if production(n) = ′A→ QR′ thenri := r(ai)rs := R(child(n, 2), ri)qi := q(rs)qs := Q(child(n, 1), qi)return f(qs)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 149 / 309

PART 5 - TYPE CHECKING

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 150 / 309

Static Program Checking

Type Checks Check of the used type. Error if operands areincompatible with the used operator. Example: 1.2 + 2(real + int).Flow-of-Control Checks Check if the transfer of the programexecution is possible. Example: break needs an enclosing loop.goto label needs a defined label.Uniqueness Checks Check if an object has been defined exactlyonce. Example: In Pascal each identifier must be unique.Name-related Checks In some languages, names (e.g. forprocedures) are used which need to occur at a different location(e.g. at the end of a procedure).

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 151 / 309

Tasks

Check if the type system of the language is satisfied.

Separate type checker is not always necessary.

parser typechecker

intermediatecode

generator

token stream parse tree syntax tree

representationintermediate

Typesystems (Examples):

“If both operands of the arithmetic operators of addition, subtractionand multiplication are of type integer, then the result is of typeinteger”“The result of the unary & operator is a pointer to the object referredto by the operand. If the type of the operand is ’...’, the type of theresult is ’pointer of ...’.”

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 152 / 309

Type Expressions

A type expression is:

1 a Basic Type integer, boolean, char, and real as well as a special BasicType type error or void.

2 the Type Name

3 a composite type in the form of:

1 Arrays. array(I, T ); set of indexes I, type T2 Products. T1 × T23 Records. record((N1 × T1)× . . .× (Nk × Tk)); name Ni, types Ti4 Pointers. pointer(T )5 Functions. T1 → T2

4 and type variables.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 153 / 309

Types – Examples

type row = recordaddress: integer;lexeme: array[1..15] of char

end;var table: array[1..101] of row;row can be represented asrecord((address× integer), (lexeme× array(1..15, char))).function f(a,b: char): ↑ integer;is represented as: char × char → pointer(integer).

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 154 / 309

Graphical Representation of Types

as DAG (Directed Acyclic Graph)→

× pointer

char integer

or as a tree→

× pointer

char char integer

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 155 / 309

Typesystems

Set of Rules

specified using attributed grammars (or verbally)

Static vs. Dynamic Checking of Types

Sound Typesystem = static type checking is sufficient

Language is strongly typed = the compiler guarantees that an acceptedprogram runs without type errors.

But some checks can only occur dynamically

table: array[0..255] of char;i: integer;

The correctness of the call table[i] in the program can not bechecked by the compiler.

Error Recovery is important (even for type errors)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 156 / 309

Type Checker Spec

Language Definition:

P → D ; ED → D ; D| id : TT → char | integer | array [ num ] of T | ↑ TE → literal | num | id |E mod E|E[E]|E ↑

Example:key: integer ;key mod 1999

array [256] of chararray(1 . . . 256, char)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 157 / 309

1. Secure Type Info

Production Semantic RuleP → D ; ED → D ; DD → id : T {addtype( id.entry, T.type)}T → char {T.type := char}T → integer {T.type := integer}T → array [ num ] of T1 {T.type := array(1 . . .num.val, T1.type)}T →↑ T1 {T.type := pointer(T1.type)}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 158 / 309

2. Type Checking – Expressions

Production Semantic RuleE → literal E.type := charE → num E.type := integerE → id E.type := lookup( id.entry)

E → E1 mod E2 E.type :=

if E1.type = integer andE2.type = integer then integer

else type error

E → E1[E2] E.type :=

if E1.type = array(s, t) andE2.type = integer then t

else type error

E → E1 ↑ E.type :=

{if E1.type = pointer(t) then telse type error

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 159 / 309

3. Type Checking – Statements

Production Semantic Rule

S → id := E S.type :=

{if id.type = E.type then voidelse type error

S → if E then S1 S.type :=

{if E.type = boolean then S1.typeelse type error

S → while E do S1 S.type :=

{if E.type = boolean then S1.typeelse type error

S → S1 ; S2 S.type :=

if S1.type = void and

S2.type = void then voidelse type error

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 160 / 309

4. Type Checking – Functions

Syntax Extension: T → T ’→ ’ T DefinitionE → E ( E ) FunctionCall

Type Extraction + Type Checking:

Production Semantic RuleT → T1 ’→ ’ T2 T.type := T1.type→ T2.type

E → E1 ( E2 ) E.type :=

if E1.type = s→ t and

E2.type = s then telse type error

Example:root : ((real → real) × real) → real

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 161 / 309

Type Equivalence

When are types equivalent???

structural equivalencename equivalence

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 162 / 309

Structural Equivalence

(1) function sequiv(s, t) : boolean; begin(2) if s and t are the same basic type then(3) return true(4) else if s = array(s1, s2) and t = array(t1, t2) then(5) return s1 = t1 and sequiv(s2, t2)(6) else if s = s1 × s2 and t = t1 × t2 then(7) return sequiv(s1, t1) and sequiv(s2, t2)(8) else if s = pointer(s1) and t = pointer(t1) then(9) return sequiv(s1, t1)(10) else if s = s1 → s2 and t = t1 → t2 then(11) return sequiv(s1, t1) and sequiv(s2, t2)(12) else return false end

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 163 / 309

Encoding of Type Expressions

Expression as Bit vector (efficient storage and comparison)Example:

Type Constructor Encodingpointer 01array 10

freturns 11

Basic Type Encodingboolean 0000char 0001integer 0010real 0011

Type expression Encodingchar 00 00 00 0001

freturns(char) 00 00 11 0001pointer(freturns(char)) 00 01 11 0001

array(pointer(freturns(char))) 10 01 11 0001

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 164 / 309

Name vs. structural Equivalence

Example (Pascal Programm)type link = ↑ cell;var next: link;

last : link;p : ↑ cell;q,r : ↑ cell;

Do all variables have the same type?Depends on the typesystem (and the compiler in pascal!)Implementation of the above example creates implicit types (e.g.type np : ↑ cell for variable p).

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 165 / 309

Cyclic Typedefinition (Example)

type link = ↑ cell;cell = record

info: integer;next: link;

end;

cell = record

integerinfo next pointer

cell

cell = record

integerinfo next pointer

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 166 / 309

Type conversion / Coercions

Statement of the problem: x+i with x as a real- and i as an integervariable.

There exist only operators for (real + real) or (int + int)

Type conversion necessary! x = int2real(i)

Implicit (by the compiler) or explicit (by the programmer) possible

Implicit = Coercion

Loss of information should be prevented (int→ real but not real→ int).

Performance!!!for I := 1 to N do X[I] := int2real(1) (PASCAL; X is anarray of reals) needs 48,4 µsfor I := 1 to N do X[I] := 1.0 needs only 5,4 µs.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 167 / 309

Type conversion - Semantic Rules (1)

Production Semantic RuleE → id E.type := lookup(id.entry)

E.txt := id.entryE → E1 op E2 E.type := if E1.type = integer and E2.type = integer

then integerelse if E1.type = integer and E2.type = real

then realelse if E1.type = real and E2.type = integer

then realelse if E1.type = real and E2.type = real

then realelse type error

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 168 / 309

Type conversion - Semantic Rules (2)

Production Semantic RuleE → E1 op E2 E.txt := if E1.type = integer and E2.type = integer

then E1.txt ◦ E2.txtelse if E1.type = integer and E2.type = real

then int2real(E1.txt) ◦ E2.txtelse if E1.type = real and E2.type = integer

then E1.txt ◦ int2real(E2.txt)else if E1.type = real and E2.type = real

then E1.txt ◦ E2.txtelse type error

E → num E.type := integerE.txt := val

E → num.num E.type := realE.txt := val

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 169 / 309

Overloading

Symbols with different meaning (dependent on applicationcontext)

mathematics: + operator (integer, reals, complex numbers)ADA: ()-Expression for array access AND function calls

Overloading is resolved, when the meaning is clear (operatoridentification)Overloading can often be resolved by the types of operands.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 170 / 309

Overloading - Possible Types

Example (ADA):function "*"(i,j: integer) return complex;function "*"(i,j: complex) return complex;

Possible types for * are:

integer × integer → integerinteger × integer → complexcomplex× complex→ complex

Assumption: 2,3,5 are integer

3*5 is either integer or complex.So 2 * (3 * 5) must be of type integer.(3*5)*z is of type complex, if z is of type complex.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 171 / 309

Handling Overloading

Instead of a type the set of all possible types must be stored in anattribute.Attribute types!E′ → E E′.types = E.typesE → id E.types = {lookup( id.entry)}E → E1 ( E2 ) E.types = {t|∃s ∈ E2.types∧ s→ t ∈ E1.types}

Example: 3*5

i i i

i i c

c c c

E: {i,c}

E: {i} *: E: {i}

3: {i} 5: {i}

{

}

,

,

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 172 / 309

Uniqueness of Types

Expressions may only have one type (otherwise type error)Production Semantic RuleE′ → E E′.types := E.types

E.unique := if E′.types = {t} then t else type errorE → id E.types := {lookup( id.entry)}

E → E1 ( E2 )

E.types := {s′|∃s ∈ E2.types∧ (s→ s′) ∈ E1.types}t := E.uniqueS := {s|s ∈ E2.types∧ s→ t ∈ E1.types}

E2.unique := if S = {s} then s else type errorE1.unique := if S = {s} then s→ t else type error

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 173 / 309

Polymorphic Functions

Polymorphic Function = Function, whose argument may have a arbitrarytype

Polymorphic refers to functions and operators

Examples: Built-in operators for array-access, pointer manipulation

Reason for polymorphism:

Code can be used for various data structuresExample: finding the length of lists (e.g. ML)fun length(lptr) =

if null(lptr) then 0else length(tl(lptr))

length([sun,mon,tue]), length([1,2,3,4])not possible in PASCAL!

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 174 / 309

Type Variables

Variables, that allow us to talk about unknown types

Note: Type Variables as greek letters α, β, . . ..

Type Inference = problem of deciding the type of an expression takinginto account the application (of the expression).

Example

type link ↑ cell;procedure mlist ( lptr : link; procedure p)begin

while lptr <> nil do beginp(lptr);lptr := lptr↑.next

endend;

mlist: link × procedure →voidp: link → void

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 175 / 309

Example - Type Inference

Program:

function deref(p);begin

return p↑end;

Derivation:1 Type of p is β (Assumption)2 From p ↑ follows that p must be a pointer. Therefore it holds:β = pointer(α).

3 Furthermore, we know that the type of p↑ must be α.4 Therefore, it follows: ∀α : pointer(α)→ α is the type of the function

deref.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 176 / 309

Language for Polymorphism

Type expression of the form ∀α.E(α) denotes a ’polymorph type’.Language definition:P → D ; ED → D ; D| id : QQ→ ∀ type variable . Q|TT → T ′ →′ T |T × T |(T )

| unary constructor ( T )| basic type | type variable

E → E ( E ) |E , E| id

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 177 / 309

Example

deref : ∀α.pointer(α)→ α ;q : pointer(pointer(integer)) ;deref(deref(q))

deref0 : pointer(α0)→ α0

derefi : pointer(αi)→ αi q : pointer(pointer(integer))

apply : αi

apply : α0

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 178 / 309

Differences in Type Handling

In distinction from former type handling (without polymorphism):1 Arguments of polymorph functions in an expression may have

different types.2 The concept of type equivalence is different.pointer(α) = pointer(pointer(integer)) ???

3 Calculated Types must be used in further consequence. The effectof the unification of two expressions must be preserved.α is assigned the type t. If α is referenced elsewhere, t must beused!

Terms: Substitution, Instances, Unification

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 179 / 309

Substitution, Instances

Substitution = function that maps type variables to typeexpressions. S : type variables 7→ type expressions

Example: α 7→ pointer(integer)

Application of a substitution:function subst(t : type expression) : type expressionbegin

if t is a basic type then return telse if t is a variable then return S(t)else if t is t1 → t2 then return subst(t1)→ subst(t2)

endS(t) . . . Instance. We write s < t⇔ s is instance of t.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 180 / 309

Examples

Instances:pointer(integer) < pointer(α)

pointer(real) < pointer(α)integer → integer < α→ α

pointer(α) < βα < β

No Instances:integer real substitution on Basic Types not possibleinteger → real α→ α inconsistent replacement of αinteger → α α→ α all occurrences must be replaced

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 181 / 309

Unification

2 types t1, t2 are unifiable if there exists a substitution S, so thatS(t1) = S(t2) holds.In praxis, we are interested in the Most General Unifier (MGU).

1 S(t1) = S(t2)2 Every substitution S′ with S′(t1) = S′(t2) must be an instance of S.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 182 / 309

Checking Polymorphic Functions

2 Functions:1 fresh(t) replaces all Variables in the type expression t with new

variables. A pointer to the node representing the new expression isreturned.

2 unify(m,n) unifies the two expressions m and n. As a side effectthe substitution is performed.

Translation Schema:Production Semantic Rule

E → E1 ( E2 )p := mkleaf(newtypevar);unify(E1.type,mknode(

′→′, E2.type, p));E.type := p

E → E1 , E2 E.type := mknode(′×′, E1.type, E2.type)E → id E.type := fresh( id.type)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 183 / 309

Example Type Checking

deref0 : pointer(α0)→ α0

derefi : pointer(αi)→ αi q : pointer(pointer(integer))→ β

apply : αi

apply : α0

Summary (Bottom-up type detection):Expression : Type Substitution

q : pointer(pointer(integer))derefi : pointer(αi)→ αi

derefi(q) : pointer(integer) αi = pointer(integer)deref0 : pointer(α0)→ α0

deref0(derefi(q)) : integer α0 = integer

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 184 / 309

Unification Algorithm

Input. A graph and a pair of nodes m and n, which should be unified.Output. True, if the nodes can be unified, False otherwise.Method. A node is represented by the record [constructor, left, right, set],where set is the Set of equivalent nodes. A node of set is chosen asrepresentative of this set. In the beginning, each set contains only the nodeitself.

find(n) returns the representative node

union(m,n) merges the equivalence sets. The new representative node is anode which does not correspond with a variable. If there existsno such node, a former representative node is chosen as thenew one.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 185 / 309

Algorithm - Pseudocode

function unify(m,n : node) : boolean begins := find(m);t := find(n);if s = t then return trueelse if s and t are nodes that represent the same basic type then return trueelse if s is an op-node with children s1, s2 and

t is an op-node with children t1, t2 then beginunion(s, t);return unify(s1, t1) and unify(s2, t2) end

else if s or t represents a variable then beginunion(s, t)return true end

else return false end

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 186 / 309

Example – Unification

Type expression:((α1 → α2)× list(α3))→ list(α2)((α3 → α4)× list(α3))→ α5

..→ : 1.

× : 2

.

list : 8

.

→ : 3

.

α1 : 4

.

α2 : 5

.

list : 6

.

α3 : 7

.

→ : 11

.

α4 : 12

.

× : 10

. → : 9.

α5 : 14

.

list : 13

Question: unify(1, 9) =?

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 187 / 309

PART 6 - RUN-TIME ENVIRONMENT

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 188 / 309

Objectives/Tasks

Relate static source code to actions at program runtime.Names in the source code relate to (not necessarily the same)data objects on the target machine.Allocation and deallocation of data objects need to be managed(run-time support packages).procedure activationStore data objects accordingly to their data type.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 189 / 309

Definitions

Procedure definition = name + bodyProcedures with return value = functionsProgram = procedure (e.g. main)Procedure name in one location in the code = procedure callVariables (identifier) in procedure definition = formal parametersArguments of a procedure call = actual parameters (substituteformal parameters after call)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 190 / 309

Example Program

program sort(input,output)var a: array[0..10] of integer;procedure readarray;

var i: integer;begin .... end;

procedure partition(y,z: integer) : integer;var i,j,x,v : integer;begin .... end;

procedure quicksort(m,n : integer);var i: integer;begin

if (n>m) then begini := partition(m,n); quicksort(m,i-1); quicksort(i+1,n);

endend;

begina[0]:=-9999; a[10]:=9999;readarray; quicksort(1,9);

end.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 191 / 309

Activation Trees

Assumptions:1 Sequential control flow2 Procedure activation starts at the beginning of the body. After

finishing the procedure, the statement located after the procedurecall is executed.

Activation = execution of the bodyLife time of a procedure = step sequence (time) from the first tothe last step during the procedure execution.Recursive procedure calls are possible (do not have to occurdirectly) P → Q→ . . .→ P

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 192 / 309

Activation Trees / Definition

1 each node represents a procedure activation2 the root node represents the activation of the main program3 node a is the parent node of b↔ control flow: In a, b is called

(activated)4 node a is left of node b↔. The lifetime of a is ahead of the lifetime

of b

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 193 / 309

Example Activation Tree

q(9,9)q(7,7)p(7,9)

q(7,9)q(5,5)p(5,9)

q(3,3)q(2,1)p(2,3)

q(2,3)q(1,0)p(1,3)

q(5,9)q(1,3)p(1,9)

q(1,9)r

s

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 194 / 309

Control Stack

Control flow = Depth processing (left to right) of the activation treeControl stack = stack to save all procedures at their lifetimeBeispiel:

q(2,3)q(1,0)p(1,3)

q(1,3)p(1,9)

q(1,9)r

s

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 195 / 309

Declarations / Scopes

Explicit or implicitvar i : integer;

scope of the variable given through Scope Rulesvariables can be used within the scopeglobal vs. local variablesvariables with the same name may denote different objects (due totheir scope)sequence of variable access (first local, then global variable if thename is identical,...)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 196 / 309

Name Binding

data object: storage location, which can store a value

A variable (variable name) can reference different data objects duringruntime.

Program language semantics:

Environment: function that maps names to storage locationsState: function that maps the storage locations to values

Environment and State are different!!!!Example: pi is associated with the address 100, which stores the value0. After pi := 3.14 the storage location 100 has the value 3.14; pi,however, still points to 100.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 197 / 309

Binding

A variable x is bound to storage location s, if the storage locationis associated with x.Location does not always have to be a (real) storage location (inthe main memory of the computer); e.g. complex datatypesCorrelation between STATIC and DYNAMIC notations:

STATIC NOTATION DYNAMIC COUNTERPARTdef. of a procedure activation of a procedure

declaration of a name bindings of the namescope of a declaration lifetime of a binding

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 198 / 309

Important Questions

. . . regarding organization of memory management and name binding.

1 Are there recursive procedures?

2 What happens with the values of local variables after finishing theprocedure execution.

3 Can a procedure reference non-local variables?

4 How are parameters passed to a procedure?

5 Can procedures be passed as parameters?

6 Can procedures be returned as a return value?

7 Is there dynamic memory allocation?

8 Does memory have to be deallocated explicitly? (or does this happenimplicitly (garbage collection)?)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 199 / 309

Memory management

Run time memory for:1 generated target code2 data objects3 control stack for procedure activation

typical layout:

CodeStatic Data

Stack↓↑

Heap

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 200 / 309

Activation Record

. . . for information storage at a procedure call:1 temporary values (evaluation of expressions,..)2 local data (local variables,..)3 machine data to save (program counter, registers,..)4 access links (link to non-local data)5 control link (link to activation record of the called procedure)6 current parameters (usually stored in register)7 return value of the called procedure

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 201 / 309

Compile-Time Layout

Memory = blocks of Bytes, Byte = smallest addressable unit

usually: 1 Byte = 8 Bit; n Bytes = Word

Memory for a variable (or parameter) is dependant on the type.Example: Basic Types (int, real, boolean,..) = n Bytes

Storage Layout dependant on addressing:Example:

Aligned Integers may only reside at certain addresses (addresses that aredivisible by 4)

Padding 10 characters are necessary to save a string, but there must be 12Bytes allocated.

arrays, records are written to a memory range that is of sufficient size.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 202 / 309

Memory Allocation Strategies

1 static allocation (at compile time)2 stack allocation3 heap allocation

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 203 / 309

Static Allocation

Memory mapping is determined at compile time.Local values remain stored even after procedure termination.No run time support package necessary.Limitation:

1 Size of the data structures must be known at compile time.2 Recursive procedures are only possible with restrictions (all

recursive calls share the same memory!).3 Data structures can not be created dynamically.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 204 / 309

Stack Allocation

Idea:Control Stackprocedure is activated→ activation record is pushed to the stackprocedure activation terminates→ activation record is popped fromthe stack

local values are deleted after termination of the activation

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 205 / 309

Example

Activation Tree activation record Remarks

s sa : array

Frame for s

r

s sa : array

ri : integer

r is activated

q(1,9)r

s

sa : arrayq(1,9)

i : integer

Frame for r has be-en popped and q(1,9)pushed

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 206 / 309

Example...

Activation Tree activation record Remarks

q(1,0)p(1,3)

q(1,3)p(1,9)

q(1,9)r

s

sa : arrayq(1,9)

i : integerq(1,3)

i : integer

Control has just retur-ned to q(1,3)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 207 / 309

Calling/Return Sequences

Calling Sequence: allocate the activation records and fill in thefieldsReturn Sequence: recover machine stateCalling sequences do not necessarily equal activation record

1 caller evaluates the current parameters2 return address, stack top are stored in the activation record of the

calling procedure (callee).3 The callee stores the register values and other status information4 The callee initializes the local data and starts the execution.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 208 / 309

Sequences...

possible return sequence:1 the callee stores the return value2 the stack, register information,.. are restored3 the caller copies the return value into his activation record

task division:

responsibility

Callee’s

responsibility

Caller’s

record

activation

Callee’s

record

activation

Caller’s

control link

control link

temporaries and local data

links and saved status

parameters and return value

temporaries and local data

links and saved status

parameters and return value

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 209 / 309

Data with variable length

storage not directly in the activation recorda pointer to the data is stored

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 210 / 309

Dangling References

Reference to memory is used but memory has already been deallocated.

logical programming error

cause of mysterious bugs

Example:

main() {int *p;p = dangle(); }

int *dangle() {int i = 23;return &i; }

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 211 / 309

Heap Allocation

Necessary if:1 value of local variable needs to be retained (after activation)2 the called procedure survives the calling procedure

Memory is allocated and deallocated at requestMemory management is necessary:

1 linked list for storing free blocks2 free sections should be filled in an optimal way

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 212 / 309

Access to nonlocal names

lexical or static scope rules (declaration of names decided atcompile time)static scope with most closely nested scopesdynamic scope rules (declaration of names decided at run time;activations are considered)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 213 / 309

Blocks

most closely nested:1 the scope of a declaration in block B contains B.2 If the name x in B is not declared, an occurrence of x in B is in the

scope of a declaration of x in the surrounding block B′:1 x is declared in B′.2 B′ is the closest immediate surrounding block of B which declares x.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 214 / 309

Example – Scopes

main() {int a = 0; B0

int b = 0;{

int b = 1; B1

{int a = 2; B2

}{

int b = 3; B3

}}

}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 215 / 309

Memory Handling

1 Memory is provided via a stack. If a block is executed, memory forthe local names is allocated. This memory will be deallocatedafter termination of the block.

2 Alternatively, it is possible to provide the memory for all blocks of aprocedure at the same time. Memory can be decided at compiletime (except if there is variable memory)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 216 / 309

Global Data

memory space can be allocated statically

procedures can be passed as parameter (C: pointer)

Example:

program pass(input, output);var m : integer;function f(n: integer) : integer;

begin f := m + n end { f };function g(n: integer) : integer;

begin g := m * n end { g };procedure b(function h(n: integer) : integer);

begin write(h(2)) end { b };begin

m := 0;b(f); b(g); writeln

end.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 217 / 309

Nested Procedures

procedure definitions in proceduresExample:program sort(input,output)

. . .procedure exchange(i,j:integer);. . .procedure quicksort(m,n: integer);

var k,v: integer;begin

. . .exchange(i,j);. . .

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 218 / 309

Nesting Depth, Access Lists

Nesting Depth: depth of the nesting ofprocedures/blocks. . . (programms: 1, procedure in program: 2, . . . )Access List: implementation of the access to nested procedures

access link: field in the activation record of a procedureIf P is declared in Q, the access link of P point to the access link ofQ (in the last activation of Q)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 219 / 309

Example – Access Lists

e(1,3)

i,j

p(1,3)

k,v

q(1,3)

access link

k,v

q(1,9)

a,x

s

access link

access link

access link

access link

access link

s

a,x

q(1,9)

k,v

access link

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 220 / 309

Search for non-local names

Procedure P is in nesting depth np and accesses a with na ≤ np.1 P is being executed. The activation record of P is located at the top

of the stack. Walk along np − na access links.2 Subsequently, we reach the activation record, which contains a. An

offset value returns the actual position of a.

(nP − na, offset) defines the address of a. Computation can bedone at compile time.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 221 / 309

Procedure Access (nested)

P calls procedure X.1 nP < nX : X lies lower than P . Therefore, X must be defined in P

(or X can not be accessed from P ). The access link of X points toP .

2 nP ≥ nX : Walk along nP − nX + 1 access links. This activationrecord contains P as well as X.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 222 / 309

Procedure Parameter

passing a procedure as parameternot allowed in all languageshandling of links (similar to already described method):

assuming c calls b and passes f as parametera link from f to c is computedthis link is used as access link when f is actually called.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 223 / 309

Dynamic Scope

New activation uses existing binding of non-local names in theirmemory. a in the called activation references the same memorylike the calling activation. New bindings are provided for localnames of the called procedure.Semantics of static scope and dynamic scope are different!

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 224 / 309

Example – Dynamic Scope (1)

program dynamic (input, output);var r: real;procedure show;

begin write(r : 5:3) end;procedure small;

var r : real;begin r := 0.125; show end;

beginr := 0.25;show; small; writeln;show; small; writeln;

end

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 225 / 309

Example – Dynamic Scope (2)

show

small

dynamic

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 226 / 309

Example – Dynamic Scope (3)

Example:Static Scope: Output = 0.250 0.250 nl 0.250 0.250 nlDynamic Scope: Output = 0.250 0.125 nl 0.250 0.125 nl

Deep access: Control links are used as access links. Search in thestack (from top to bottom) for the first entry of a non-local name.Shallow access: Current value of a name is deposited in a(statically) allocated location. At an activation of P , local name nuses the location. The old value of the location can be cached inthe activation record and is therefore restorable.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 227 / 309

Parameter Passing

How are parameters passed at a call?

procedure exchange(i,j: integer);var x: integer;begin

x := a[i]; a[i] := a[j]; a[j] := xend

Call-by-ValueCall-by-ReferenceCopy-RestoreCall-by-Name

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 228 / 309

Call-by-Value

Formal parameters are considered as local names.The caller evaluates the actual parameter and passes them theassociated formal parameters.Pointers can also be passed as values.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 229 / 309

Call-by-Reference

Instead of the value (as in Call-by-Value) a pointer to the memorylocation of the actual parameter is passed.var parameter in PASCAL are references.Arrays are usually passed as reference.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 230 / 309

Copy-Restore

Hybrid method between Call-by-Value and Call-by-Reference.Method:

1 Before executing the procedure, the actual parameters areevaluated. The R-Values (values) are passed to the respectiveformal parameters (as Call-By-Value). Additionally, the L-Values(locations) are computed.

2 After procedure termination, the actual R-Values are copied back tothe L-Values of the actual parameters (if available).

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 231 / 309

Call-by-Name

Procedures are treated as macros, i.e. instead of the procedurecall, the body of the procedure is substituted; all formalparameters in the body are replaced by the actual parameters.(Macro-Expansion)The local names of the called procedures must be different to thename of the calling procedure. (Variable renaming is partiallynecessary)The actual parameters are put in braces to avoid problems.Problems: Call swap(i,a[i]) is expanded to:temp := i; i := a[i]; a[i] := tempInstead of a[i]=i we write a[a[i]]=i. temp := x; x := y;y := temp

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 232 / 309

Symbol Table

entries correspond to the declaration of namesstore binding and scope informationstorage allocation information (that is needed at run time)storage of the symbol table in a liststorage of the symbol table in a hash table

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 233 / 309

PART 7 - INTERMEDIATE CODE GENERATION

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 234 / 309

Objectives/Tasks

provide a target machine independent formatadvantages:

easy adaption on different target machinesmachine independent code optimization can be realized

attributed grammars can be used

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 235 / 309

Languages

graphical representation (syntax tree)Three-Address Codex := y op z

x,y,z are arbitrary numbers, constants, names (variables), ortemporary variablesop is an operator

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 236 / 309

Three-Address Code Statements

1 assignments x := y op t

2 assignments with unary operator x := op y

3 copy statements x := y

4 unconditional jumps goto L

5 conditional jumps if x relop y goto L

6 param x,call p, n calls procedure p with n parameters.return y where y is optional.

7 assignments with indices: x := y[i] und x[i]:=y.8 addresses and pointer assignments: x := & y, x := *y and *x:= y

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 237 / 309

Generating the Three-Address Code (simplified)

S-attributed grammarS.code represents the three address codeE.place the name which contains the value of the non-terminal E.E.code the sequence of three address code statements, thatevaluate E.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 238 / 309

Assignments

S → id := E S.code := E.code||gen( id.place ’:=’ E.place)E → E1 + E2 E.place := newtemp;

E.code := E1.code||E2.code||gen(E.place ’:=’ E1.place ’+’ E2.place)E → E1 * E2 E.place := newtemp;

E.code := E1.code||E2.code||gen(E.place ’:=’ E1.place ’*’ E2.place)E → - E1 E.place := newtemp;

E.code := E1.code||gen(E.place ’:=’ ’uminus’ E1.place)E → ( E1 ) E.place := E1.place;E.code := E1.codeE → id E.place := id .place;E.code := ”

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 239 / 309

If-then-else

Statement: S → if E then S1 else S2Generated Code:S.else := newlabel;S.after := newlabel;

S.code :=

E.code||gen( ’id’ E.place ’=’ ’0’ ’goto’ S.else)||S1.code||gen( ’goto’ S.after)||gen(S.else ’:’ )||S2.code||gen(S.after ’:’ )

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 240 / 309

While-loops

Statement: S → while E do S1

Generated Code:S.begin := newlabel;S.after := newlabel;

S.code :=

gen(S.begin ’:’)||E.code||gen( ’if’ E.place ’=’ ’0’ ’goto’ S.after)||S1.code||gen( ’goto’ S.begin)||gen(S.after ’:’)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 241 / 309

Implementation of the Three-Address Code

Quadruple = record with 4 fields:

op Operatorarg1 1. Argumentarg2 2. Argumentresult Temp. Variable

arguments and temporary variables are usually pointer to symboltable entriesTriple = “Quadruple” without result field. instead, the position ofthe triple that calculates a value is stored in the argument.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 242 / 309

Examples

a := b * -c + b * -cQuadruple

op arg1 arg2 result

(0) uminus c t1(1) * b t1 t2(2) uminus c t3(3) * b t3 t4(4) + t2 t4 t5(5) := t5 a

Tripleop arg1 arg2

(0) uminus c(1) * b (0)(2) uminus c(3) * b (2)(4) + (1) (3)(5) := a (4)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 243 / 309

Declarations

provision of the memory space for local names of a procedure(relative addresses of the activation record or memory of the staticdata area)procedure declarations:

1 offset . . . next free relative address2 initialization offset = 03 offset is used for the current data object4 then, the offset is increased by the size of the current data object

enter(name, type, offset) creates an entry in the symbol table forname, assigns it the type type and offset as relative address.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 244 / 309

Attributed Grammar for Declarations

P → {offset := 0}DD → D ; DD → id : T{enter( id.name, T.type, offset)

offset := offset+ T.width}T → integer{T.type := integer;T.width := 4}T → real{T.type := real;T.width := 8}T → array [ num ] of T1{T.type := array(num.val, T1.type);

T.width := num.val× T1.width}T →↑ T1{T.type := pointer(T1.type);T.width := 4}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 245 / 309

Scope Information

each procedure has its own symbol tablefor every procedure declaration a symbol table is created+ a link to the symbol table of the enclosing procedureoffset is now local!example grammar:P → DD → D ; D| id : T | proc id ; D ; S

grammar definition. . . (Exercise)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 246 / 309

Records, Field Names

create symbol table for fields (marker L).names are stored in the new symbol tablegrammar:T → record LD end {T.type := record(top(tblptr));

T.width := top(offset)pop(tblptr); pop(offset)}

L→ ε {t := maketable(nil);push(t, tblptr); push(0, offset)}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 247 / 309

Assignments

assumption so far: names are represented by oneselfcorrect if name for pointer is in their symbol tablegeneralization by attribute: name for identifierlookup(id.name) results in the entryadvantage: usable even if the entry is declared in an enclosingprocedureBy defining lookup, the scope of a language regarding anidentifiers is defined.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 248 / 309

Grammar

S → id := E p := lookup(id.name);if p 6= nil then

emit(p′ :=′ E.place)else error

E → E1 + E2 E.place := newtemp;emit(E.place′ :=′ E1.place′ +′ E2.place)

E → E1 * E2 E.place := newtemp;emit(E.place′ :=′ E1.place′ ∗′ E2.place)

E → - E1 E.place := newtemp;emit(E.place′ :=′ ′uminus′E1.place)

E → ( E1 ) E.place := E1.placeE.place := newtemp

E → id p := lookup( id.name)if p 6= nil then

E.place := pelse error

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 249 / 309

Addressing Array Elements

array access fast, if the elements are stored in one block.

access to the element at position i (w . . . element size):

base+ (i− low)× w

can be rewritten to:i× w + (base− low × w)

advantage: base− low × w = c can be calculated at compile time!

two dimensional arrays (A(i1, i2)):

row major (row-by-row) base+ ((i1 − low1)× n2 + i2 − low2)× wwhere n2 = high2 − low2 + 1.column major (column-by-column)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 250 / 309

Translation Schema for Array Access (1)

1 S → L := E

if L.offset = null thenemit(L.place′ :=′ E.place);

elseemit(L.place ’[’ L.offset ’]’ ’:=’ E.place)

2 E → E1 + E2

E.place := newtempemit(E.place ’:=’ E1.place ’+’ E2.place)

3 E → ( E1 )E.place := E1.place

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 251 / 309

Translation Schema (2)

1 E → L

if L.offset = null thenE.place := L.place

elseE.place := newtempemit(E.place ’:=’ L.place ’[’ L.offset ’]’)

2 L→ Elist ]L.place := newtempL.offset := newtempemit(L.place ’:=’ c(Elist.array))emit(L.offset ’:=’ Elist.place ’*’ width(Elist.array))

3 L→ idL.place := id.place;L.offset := null

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 252 / 309

Translation Schema (3)

1 Elist→ Elist1, E

t := newtempm := Elist1.ndim+ 1emit(t ’:=’ Elist1.place ’*’ limit(Elist1.array,m))emit(t ’:=’ t ’+’ E.place)Elist.array := Elist1.arrayElist.ndim := m

2 Elist→ id [ EElist.array := id.placeElist.place := E.placeElist.ndim := 1

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 253 / 309

Boolean Expressions

2 main tasks:1 calculating logical values2 changing program procedure

grammar:

E → E or E|E and E| not E| ( E ) | id relop id |true|false

2 methods to represent boolean values:1 true and false are coded as numbers (e.g. true = 1, false = 0).2 Flow-of-Control: values are represented as positions in the code

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 254 / 309

Numeric Representation

Example 1: a or (b and (not c))

t1 := not ct2 := b and t1t3 := a or t2

Example 2: a < b

100: if a < b then goto 103101: t := 0102: goto 104103: t := 1104:

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 255 / 309

Translation Schema (Bool. Expr.) I

E → E1 or E2 E.place := newtemp; emit(E.place ’:=’ E1.place ’or’ E2.placeE → E1 and E2 E.place := newtemp; emit(E.place ’:=’ E1.place ’and’ E2.placeE → not E1 E.place := newtemp; emit(E.place ’:=’ ’not’ E1.placeE → id1 relop id2 E.place := newtemp

emit( ’if’ id1.place relop.opid2.place ’goto’ nextstat+ 3)emit(E.place ’:=’ ’0’emit(’goto’ nextstat+ 2)emit(E.place ’:=’ ’1’

E → true E.place = newtemp; emit(E.place ’:=’ ’1’)E → false E.place = newtemp; emit(E.place ’:=’ ’0’)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 256 / 309

Short-Circuit Code

Representation of the boolean expressions without generating code foroperators and, or, not

values are represented by positions in the code

Jumping Code

Example: a < b or c < d and e < f

100: if a < b goto 103101*: t1 := 0102: goto 104103*: t1 := 1104: if c < d goto 107105*: t2 := 0106: goto 108

107*: t2 := 1108: if e < f goto 111109*: t3 := 0110: goto 112111*: t3 := 1112*: t4 := t2 and t3113*: t5 := t1 or t4

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 257 / 309

Flow-of-Control Statements

Statements:

S →if E then S1if E then S1 else S2while E do S1

use labels to represent true and falsedependent on the evaluation of E, branch out.attributed grammar (see above)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 258 / 309

Control-Flow Translation

Use E.true (E.false) if E evaluates to true.Example E1 or E2 is true, if E1 is true.not all expressions are evaluated (like e.g. in C)Example: a < b or (c < d and e < f)

if a < b goto Ltruegoto L1

L1: if c < d goto L2goto Lfalse

L2: if e < f goto Ltruegoto Lfalse

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 259 / 309

Translation Schema (Bool. Expr.) II

E → E1 or E2 E1.true := E.true;E1.false := newlabel;E2.true := E.trueE2.false := E.false;E.code := E1.code||gen(E1.false’:’ ||E2.code

E → E1 and E2 E1.true := newlabel;E1.false := E.false;E2.true := E.trueE2.false := E.false;E.code := E1.code||gen(E1.true’:’ ||E2.code

E → not E1 E1.true := E.false;E1.false := E.true;E.code := E1.code

E → id1 relop id2 E.code :=

gen

(’if’ id1.placerelop.opid2.place ’goto’ E.true

)||

gen(’goto’ E.false)E → true E.code = ’goto’ E.trueE → false E.code = ’goto’ E.false

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 260 / 309

Mixed Mode

consideration (so far) simplifiedin practice, mixed expressions are possibleExample 1: (a + b) < c

Example 2: (a < b) + (b < a)

introduce synthetic attribute E.type

E.type =

{arith Arithmetic expressionbool Boolean expression

Code Generation for E + E, E ∗ E, . . . needs to be changed.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 261 / 309

Back patching

easiest implementation of attributed grammars:1 generate a syntax tree2 generate the translation depth-first

problem with Single Pass:labels for control flow are unknown

trouble-shooting:1 jump statements are generated with empty labels2 these statements are saved in a list3 the target labels are registered once they are known

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 262 / 309

Procedure Calls

usage of run time routines for handling the parameters, the callitself and the return of values.grammar:

S → call id (Elist)Elist→ Elist, E|E

Calling Sequence must be reproduced.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 263 / 309

Attributed Grammar (simplified)

Call-by-ReferenceMemory is statically allocatedgrammar:

1 S → call id (Elist)

for each item p on queue doemit(’param’ p)

emit(’call’ id.place)2 Elist→ Elist, E

Append E.place to the end of queue3 Elist→ E

Initialize queue to contain only E.place

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 264 / 309

PART 8 - CODE GENERATION

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 265 / 309

Objectives/Tasks

Generate machine code (from the intermediate code)Inputs:

intermediate codesymbol table

possible output:absolute machine coderelocatable machine codeassembler code

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 266 / 309

Problem – Instruction Selection

easiest method: each three-address code statement is assigned instructions (codeskeleton) x := y + zis represented by:

MOV y,R0ADD z,R0MOV R0,x

Issue: not efficient

a := b + cd := a + e

MOV b,R0ADD c,R0MOV R0,aMOV a,R0 not necessaryADD e,R0MOV R0,d

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 267 / 309

Instruction Selection (2)

quality of the generated code depends on (1) speed and (2) sizedepends on the target machine

There are often several possible instructions to execute a commandExample: a := a + 1

via increment command:MOV a,R0INC R0MOV R0,a

via addition:MOV a,R0ADD # 1,R0MOV R0,a

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 268 / 309

Register Allocation

register operations are usually fastlimited number of registers availableduring register allocation, variables are selected to be stored inthe registerduring register assignment, variables are assigned to registersoptimal assignment of registers to variables is NP-complete! (evenunder realistic assumptions)register assignment partially depends on the target machine

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 269 / 309

Further Note for Code Generation

The order of the target code evaluation has an effect on efficiency.It is possible that fewer registers are needed. Finding the optimalorder is also NP-complete.Kinds of Code Generation

direct code generation + optimizationuse flow-of-control to create better codetree-directed code-selection techniques (Tree-rewriting)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 270 / 309

The Target machine

Assumptions: byte-addressable machine with 4 Bytes for a Wordand n registers R0,. . .Rn− 1

instructions are of the form: opsource, destinationExamples: MOV, ADD, SUB, . . .Address Modes:

Mode Form Address Added Costabsolute M M 1register R R 0indexed c(R) c+ contents(R) 1

indirectregister *R contents(R) 0indirectindexed *c(R) contents(c+ contents(R)) 1

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 271 / 309

Examples

MOV R0,M stores the content of R0 in location M.MOV 4(R0),M stores the content of 4 + contents(R0) in location M.MOV *4(R0),M stores the value of contents(4 + contents(R0)) inM.Further Address-Mode: MOV # 1,R0 stores the constant 1 inregister R0.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 272 / 309

Instruction Costs

costs of an instruction = 1 + costs associated with source- anddestination addresses.costs correspond to the length of the instructionsminimizing the costs minimizes the size of the instructions and theexecution time (register operations are usually faster)Example a := b + ccosts = 6

MOV b,R0ADD c,R0MOV R0,a

costs = 6

MOV b,aADD c,a

costs = 2

MOV *R1, *R0ADD *R2, *R0

costs = 3

ADD R2, R1MOV R1, a

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 273 / 309

Run-time Storage Management

handling of the memory managementwhich code needs to be created for the management of theactivation records2 strategies:

Static AllocationStack Allocation

Three-Address-Statements: call, return, halt, action

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 274 / 309

Static Allocation

Activation Record is created at compile timeCaller-Code:MOV # here + 20, callee.static areaGOTO callee.code area

Callee-Code (after finishing)GOTO *callee.static area

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 275 / 309

Stack Allocation

code for the first procedure

MOV # stackstart, SPcode for the 1. procedureHALT

Procedure call

ADD # caller.recordsize, SPMOV # here + 16, *SPGOTO callee.code area

Return Sequence

at the Callee:GOTO *0(SP)

at the Caller:SUB # caller.recordsize, SP

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 276 / 309

Flow Graphs

Basic Blocksequence of statements that don’t contain a halt or branchingstatement (except at the end).

x := y + zx is definedy,z are used (referenced)Algorithm:

1 Determine the set of leaders, i.e., the first statement of a basicblock.

1 The first statement is the leader2 Any statement that is the target of a conditional or unconditional goto

is a leader.3 Any statement that immediately follows a goto or conditional goto

statement is a leader.2 For each leader, its basic block consists of the leader an all

statements up to but not including the next leader or the end of theprogram.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 277 / 309

Example – Basic Blocks

beginprod := 0;i := 1;do begin

prod := prod + a[i]*b[i];i := i+1

end while i<=20end

(1) prod := 0(2) i := 1(3) t1 := 4*i(4) t2 := a[t1](5) t3 := 4*i(6) t4 := b[t3](7) t5 := t2*t4(8) t6 := prod + t5(9) prod := t6(10) t7 := i + 1(11) i := t7(12) if i <= 20 goto (3)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 278 / 309

Transformations of Basic Blocks

Basic Blocks calculate a set of expressionstwo blocks are equal if they calculate the same set of expressionstransformations may not change this set of expressions2 kinds of transformations:

Structure-Preserving TransformationsAlgebraic Tranformations

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 279 / 309

Structure-Preserving Transformations

Common subexpression elimination

a := b + cb := a - dc := b + cd := a - d

a := b + cb := a - dc := b + cd := b

dead-code eliminationAssuming x is defined by x := y+z but not used afterwards, thestatement can be removed.renaming temporary variablesinterchange of statements

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 280 / 309

Algebraic Transformations

there are different transformationsgoal is usually optimization of the codeExample 1: The following statements can be eliminatedx := x + 0x := x * 1

Example 2x := y ** 2→ x := y * y

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 281 / 309

Flow Graphs

directed graph

nodes are Basic Blocks

two nodes n1 and n2 are connected if the block of n2 follows directlyafter n1 in an execution. This means

1 There is a conditional and unconditional jump from the laststatement of the block of n1 to the first statement of the block of n2.

2 The block of n2 follows the block of n1 directly and the block of n1has no unconditional jump in the last statement.

Finding all loops is not always easy. A loop is a set of nodes in the flowgraph with:

1 all nodes in the set are strongly connected2 the set of nodes has a distinct start node

A loop that does not contain any other loops is called Inner Loop.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 282 / 309

Next-Use Information

If a name in a register is no longer needed, the register can beused for another name again.used at register allocation and assignment of memory location totemporary names.Definition Use: Assuming x is assigned a value in statement i. Thestatement j uses x as an operand and there is a control flow of ito j, where no other statement redefines x in this path. We thensay that j USES the value of x, which was defined at location i.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 283 / 309

Calculation of Next-Use

in the Basic Block (procedure calls also start Basic Blocks)Algorithm:

1 From the end of a basic block to its first statement for eachstatement i: x := y op z:

1 Attach to i the information currently found in the symbol tableregarding the next use and liveness of x,y, and z.

2 In the symbol table, set x to “not live” and “no next use”.3 In the symbol table, set y and z to “live” and their next uses to i. The

order of steps may not be interchanged!

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 284 / 309

A simple Code Generator

Register-Descriptor: shows which names are currently stored inwhich registers. in the initial phase, all registers are empty.Address-Descriptor: points to the location of the current value of aname at runtime. This location is either a register, a stack location,a memory location or a combination.Assumptions: for each operator there is a target machine operator,information about registers and their assignment are used.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 285 / 309

Algorithm Code-Generator

For each statement x := y op z perform the following actions:1 Invoke a function getreg to determine the location L where the result of y op z should be

stored.2 Consult the address descriptor for y to determine y’ (one of) the current location(s) of y.

Registers are preferred. If the value is not already in L, then generate a statement MOVy’, L.

3 Generate the instruction OP z’,L, where z’ is the current location of z. Update theaddress descriptor of x to indicate that x is currently in location L. If L is a register, updateits descriptor and remove x from all register descriptors.

4 If the current values of y and/or z have no next uses, are not live on exit from the block,and are in registers, alter the register descriptor to indicate that, after execution of x := yop z, those registers no longer will contain y and/or z, respectively.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 286 / 309

Special Cases

Statements of the form x := op y are converted the same way.

Statements of the form x := y are handled as follows:

If y is in a register, simply change the register and addressdescriptors to record that the value of x is now found only in theregister holding the value of y.If y has no next use and is not live on exit from the block, theregister no longer holds the value of y.If y is in memory, we use getreg to determine a new register inwhich to load y and make that register the location of x.alternatively, the statement MOV y,x can be used.

After the handling of all statements, all names that are live on exist andnot yet in their memory location are saved via MOV.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 287 / 309

Function getreg

Statement x := y op z.

1 If y is in a register that holds the value of no other name, and y is not liveand has no next use after executing the statement, then return theregister of y to be L. Update the address descriptor of y.

2 Failing (1), return an empty register for L if there is one.

3 Failing (2), if x has a next use in the block or op is an operator thatrequires a register, find an occupied register R. Store the value of R intoa memory location, update the address descriptors, and return R. If Rholds the value of several variables, a move operation for all variableshas to be generated.

4 If x is not used in the block, or no suitable occupied register can befound, select the memory location of x as L.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 288 / 309

Handling Conditionals

2 Kinds (depend on the target machine)1 Branches depend of the value of a special register (negative, zero,

positive, nonnegative, nonzero, and nonpositive). For suchmachines, if x<y goto z can be implemented by subtracting yof x and jumping if the value is negative.

2 Condition Codes: this code shows if a value that has beencalculated or loaded into a register is negative, null or positive.Usually there is a comparison operation CMP x,y, which sets thecondition code to positive if x>y. Additional command CJr z withr ∈ {>,<,=,≤,≥, 6=} jumps to the label z if the condition code>,<,. . . is null. Example:CMP x, yCJ< z

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 289 / 309

Peephole Optimization

uses only a small sequence of statementsmoving window for target code (=peephole)this should be replaced by a smaller and faster sequence.several executions possible (and preferable).Redundant Loads and Stores(1) MOV R0, a(2) MOV a, R0

Statement (2) can be removed if there is no label to (2)!

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 290 / 309

Peephole Optimization (II)

Unreachable Code# define debug 0. . .if ( debug ) {

. . .}

Flow-of-Control Optimizationsgoto L1

. . .L1: goto L2

→goto L2

. . .L1: goto L2

Algebraic Simplifications x := x + 0

Reduction in Strength x2 is implemented faster than x*x. Shift-Operation can also beused for division or multiplication with 2n.

Use of Machine Idioms Instead of i := i+1 the increment command can be used.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 291 / 309

PART 9 - CODE OPTIMIZATION

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 292 / 309

Objectives/Tasks

“Optimization” of the generated (intermediate) code (faster,smaller)Tradeoff between effort and usage of the optimization

Most payoff for the least effortFocus: target machine independent code optimizationOptimization independent of the input values!Objective: better average resultsTechniques: data and control flow analysis.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 293 / 309

Criteria for Code Transformation

The Behavior of the program must be preserved.The transformed program must be faster (smaller) on average.The analysis must be worth the effort. Meaning a programanalysis, which leads to a better transformation but needs a lot oftime is possibly not an option.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 294 / 309

Increasing Performance

Dramatical increase of the runtime (from hours to a few seconds) is onlypossible, if the program is corrected on all levels:

better algorithms and data structures (e.g. replace Insert Sort withQuicksort (N logN instead of N2))

improve loops,

do peephole transformations

select instructions

use registers

address calculations

procedure calls

transform loops

change algorithm

profile program

compiler cancompiler canuser can

target codecode generatorcode

intermediatefront end

source code

code transformationsUtilization of registers and faster (machine language) instructions,or peephole transformations

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 295 / 309

Organization Compiler optimization

Organization:

data-flow

analysis

control-flow

analysis

code

generator

code

optimizer

front

end

transformations

Advantages:

Operations for high-level constructs are explicitly available in theintermediate code and can be therefore be optimized.The intermediate code is (relatively) independent of the targetmachine. Optimizations are possible for many machines (withoutany changes).

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 296 / 309

Example–Quicksort

void quicksort(int m,int n) {int i,j;int v,x;if (n<=m) return;/* fragment begins here */i = m-1; j =n; v = a[n];while (1) {

do i = i+1; while (a[i]<v);do j = j-1; while (a[j]>v);if (i>=j) break;x = a[i]; a[i] = a[j]; a[j] = x;

} /* fragments ends here */quicksort(m,j); qicksort(i+1,n);

}

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 297 / 309

Three Address Code – Quicksort

(1) i := m-1(2) j := n(3) t1 := 4*n(4) v := a[t1](5) i := i+1(6) t2 := 4*i(7) t3 := a[t2](8) if t3 < v goto (5)(9) j := j-1(10) t4 := 4*j(11) t5 := a[t4](12) if t5 > v goto (9)(13) if i >= j goto (23)(14) t6 := 4 *i(15) x := a[t6]

(16) t7 := 4*i(17) t8 := 4*j(18) t9 := a[t8](19) a[t7] := t9(20) t10 := 4*j(21) a[10] := x(22) goto (5)(23) t11 := 4 *i(24) x := a[t11](25) t12 := 4 *j(26) t13 := 4*n(27) t14 := a[t13](28) a[t12] := t14(29) t15 := 4*n(30) a[t15] := x

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 298 / 309

Flow Graph – Quicksort

t11 := 4*i

x := a[t11]

t12 := 4*i

t13 := 4 *n

t14 := a[t13]

a[t12] := t14

t15 := 4*n

a[t15] := x

B6

t6 := 4*i

x := a[t6]

t7 := 4*i

t8 := 4*j

t9 := a[t8]

a[t7] := t9

t10 := 4*j

a[t10] := x

goto B2

B5

if i >= j goto B6

B4

j := j-1

t4 := 4*j

t5 := a[t4]

if t5 > v goto B3

B3

i := i+1

t2 := 4*i

t3 := a[t2]

if t3 < v goto B2

B2

i := m-1

j := n

t1 := 4*n

v := a[t1]

B1

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 299 / 309

Possibilities of Optimization

Common Subexpression: An Expression E, which has been calculated before and whichvalue has not been changed in the meantime. Multiple calculation of expressions can beavoided.Example: Block B5.

t8 := 4*j; t9:= a[t8]; a[t8] := x;is changed to:

t9 := a[t4]; a[t4] := x;

Copy Propagation: Introduction of copy statements to simplify equal expressions.

a := d + eb := d + ec := d + e

is changed to:

t := d + ea := tb := tc := t

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 300 / 309

Dead Code Elimination

A variable lives at a certain location in the code, if its value can beused or is used afterwards.Code is considered DEAD, when the calculated values are notused. This code can be ignored.Example:

debug := false. . .if (debug) print ...

Keyword: Constant FoldingCopy statements can often be deleted, since the variables are nolonger used.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 301 / 309

Loop Optimization

Code Motion: Moving the code in an area outside of the loop.while (i <= limit-2) ... is changed tot = limit-2; while (i <= t) ....Induction Variable Elimination: Removing some inductionvariables by using other variables.Reduction in Strength: Replacement of expensive operatorsthrough cheap ones (e.g. multiplication is replaced by addition)

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 302 / 309

Transformed Quicksort Program

goto B2

a[t2] := t5

a[t4] := t3

i := m-1

j := n

t1 := 4*n

v := a[t1]

t2 := 4*i

t4 := 4*j

a[t1] := t3

a[t2] := t14

t14 := a[t1]

B6B5

t4 := t4-4

t5 := a[t4]

if t5 > v goto B3

if i >= j goto B6

B4

B3

t2 := t2+4

t3 := a[t2]

if t3 < v goto B2

B2

B1

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 303 / 309

Optimization of Basic Blocks

Represent Basic Blocks as a Directed Acyclic Graph (DAG)

Utilize the graph for optimization

Common Subexpression Elimination:

a := b + cb := a - dc := b + cd := a - d

DAG:

d0

c0b0

a

b, d

c

+

-

+

Dead Code Elimination: Remove each root node from the DAGwhich has variables that are no longer used (DEAD variables).

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 304 / 309

Basic Block Optimization (2)

Algebraic Equivalences:x+ 0 = 0 + x = x

x− 0 = xx ∗ 1 = 1 ∗ x = x

x/1 = x

Reduction in Strengthx ∗ ∗2 = x ∗ x2.0 ∗ x = x+ xx/2 = x ∗ 0.5

Constant Folding: Evaluation of expressions that remain constant. The calculated valuesare used instead of the constants.

Further algebraic transformation: Associative and communicative laws are used, relationaloperators (<,>,..) are replaced with subtraction and tests, ..

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 305 / 309

Looking for Loops

Dominator: The node d of a flow graph dominates a node n ifevery path from the initial node to n leads through d.Recognizing loops:

Each node must have a single input node Header). This entrydominates all other nodes of the loop.There is at least one way to iterate through the loop.

Search edges in the flow graph a→ b where a dominates thenode b (Tail). Such an edge is a Back Edge.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 306 / 309

Calculating the Dominator

Input. A flow graph G with set of nodes N , set of edges E and initialnode n0.Output. The relation dom.

1 D(n0) := {n0}2 for n ∈ N − {n0} do D(n) := N ;3 while changes to any D(n) occur do

1 for n ∈ N − {n0} do1 D(n) := {n} ∪ ∩p a predecessor nD(p)

d is in D(n) iff d dom n.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 307 / 309

Natural Loop

Given: a Back Edge n→ d.

The Natural Loop consists of d and all nodes that reach n without going through d. d is theHeader.

Algorithm:

procedure insert(m)if m is not in loop then begin

loop := loop ∪ {m}push m onto stack

endprocedure main(n→ d)stack := emptyloop := {d}insert(n)while stack is not empty do begin

pop m, the first element of the stack, off stackfor each predecessor p of m do insert(p)

endInner Loops: a loop that doesn’t contain any other loops.

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 308 / 309

Reducible Flow Graphs

Definition: a flow graph is reducible then and only then, if its edges can be partitioned in 2non-intersecting parts, and the parts have the following properties:

1 The forward nodes of the graph form an acyclic graph where eachnode can be reached from the initial node.

2 All Back Edges are edges where their Heads dominate their Tails.Example: the following graph is not reducible.

32

1

F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2017 309 / 309