elaboration or: semantic analysis compiler baojian hua [email protected]

Elaboration or:Semantic Analysis

CompilerBaojian Hua

[email protected]

Front End

source code

abstract syntax

tree

lexical analyzer

parser

tokens

IRsemantic analyzer

Elaboration Also known as type-checking, or semantic a

nalysis context-sensitive analysis

Checking the well-formedness of programs: every variable is declared before use every expression has a proper type function calls conform to definitions all other possible context-sensitive info’ (highly

language-dependent) … translate AST into intermediate or machine code

Elaboration Examplevoid f (int *p)

{

x += 4;

p (23);

“hello” + “world”;

}

int main ()

{

f () + 5;

}What errors can be detected here?

Terminology

Scope Lifetime Storage class Name space

Terminologies: Scopeint x;int f (){ if (4) { int x; x = 6; } else { int x; x = 5; } x = 8;}

Terminologies: Lifetimestatic int x;

int f ()

{

int x, *p;

x = 6;

p = malloc (sizeof (*p));

if (3) {

static int x;

x = 5;

}

}

Terminologies: Storage classextern int x;

int f ()

{

extern int x;

x = 6;

if (3) {

extern int x;

x = 5;

}

}

Terminologies: Name spacestruct list { int x; struct list *list;} *list;

void walk (struct list *list){ list: printf (“%d\n”, list->x); if (list = list->list) goto list;}

Moral For the purpose of elaboration, must

take care of all of this TOGETHER Scope Life time Storage class Name space …

All these details are handled by symbol tables!

Symbol Tables In order to keep track of the types and

other infos’ we’d maintain a finite map of program symbols to info’ symbols: variables, function names, etc.

Such a mapping is called a symbol table, or sometimes an environment Notation: {x1: t1, x2: t2, …, xn: tn} where xi: ti (1≤i ≤n) is called a binding

Scope

How to handle lexical scope? It’s easy, we just insert and

remove bindings during elaboration, as we enters and leaves a local scope

Scopeint x; σ={x:int}int f () σ1 = σ + {f:…} = {x:int, f:…}{ if (4) { int x; σ2 = σ1 + {x:int} = {x:…, f:…, x:…} x = 6; } σ1 else { int x; σ4 = σ1 + {x:int} = {x:…, f:…, x:…} x = 5; } σ1 x = 8;} σ1 Shadowing: “+” is not commutative!

Implementation Must be efficient!

lots of variables, functions, etc Two basic approaches:

Functional symbol table is implemented as a functional data

structure (e.g., red-black tree), with no tables ever destroyed or modified

Imperative a single table, modified for every binding added

or removed This choice is largely independent of the

implementation language

Functional Symbol Table

Basic idea: when implementing σ2 = σ1 + {x:t} creating a new table σ2, instead of modif

ying σ1 when deleting, restore to the old table

A good data structure for this is BST or red-black tree

BST Symbol Table

c: int

a: char

b: double

e: int

c: int

’

Possible Functional Interfacesignature SYMBOL_TABLE =

sig

type ‘a t

type key

val empty: ‘a t

val insert: ‘a t * key * ‘a -> ‘a t

val lookup: ‘a t * key -> ‘a option

end

Imperative Symbol Tables The imperative approach almost

always involves the use of hash tables Need to delete entries to revert to

previous environment made simpler because deletes follow a

stack discipline can maintain a stack of entered symbols,

so that they can be later popped and removed from the hash table

Possible Imperative Interfacesignature SYMBOL_TABLE =

sig

type ‘a t

type key

val insert: ‘a t * key * ‘a -> unit

val lookup: ‘a t * key -> ‘a option

val delete: ‘a t * key -> unit

val beginScope: unit -> unit

val endScope: unit -> unit

end

Name Space

It’s trivial to handle name space one symbol table for each name space

Take C as an example: Several different name spaces

labels tags variables

So …

Implementation of Symbols For several reasons, it will be useful at some

point to represent symbols as elements of a small, densely packed set of identities fast comparisons (equality) for dataflow analysis, we will want sets of variab

les and fast set operations It will be critically important to use bit strings to repre

sent the sets For example, your liveness analysis algorithm

More on this later

Types

The representation of types is highly language-dependent

Some key considerations: name vs. structural equivalence mutually recursive type definitions dealing with errors

Name vs. Structural Equivalence In a language with structural

equivalence, this program is legal

But not in a language with name equivalence (e.g., C)

For name equivalence, can generate a unique symbol for each defined type

For structural equivalence, need to recursively compare the types

struct A{ int i;} x;

struct B{ int i;} y;

x = y;

Mutually recursive type definitions

To process recursive and mutually recursive type definitions, need a placeholder in ML, an option ref in C, a pointer in Java, bind method (rea

d Appel)

struct A{ int data; struct A *next; struct B *b;};

struct B {…};

Error Diagnostic To recover from errors, it is useful to h

ave an “any” type makes it possible to continue more type-

checking In practice, use “int” or guess one

Similarly, a “void” type can be used for expressions that return no value

Source locations are annotated in AST!

Organization of the Elaborator Module structure:

elabProg: Ast.Program.t -> unitelabStm: Ast.Stm.t * tenv * venv -> unitelabDec: Ast.Dec.t * venv * tenv-> tenv * venvelabTy: Ast.Type.t * tenv -> tyelabExp: Ast.Exp.t * venv-> tyelabLVal: Ast.Lval.t * venv-> ty

It will be extended to also do translation.

For now let’s concentrate on type-checking

Elaborate Expressions

Checks that expressions are correctly typed.

Valid expressions are defined in the C specification.

e: t means that e is a valid expression of type t.

venv is a symbol table (environment).

Elaborate Expressions

fun elabExp (e, venv) = case e of BinaryExp (PLUS, e1, e2) => let val t1 = elabExp (e1, env)

val t2 = elabExp (e2, env) in case (t1, t2) of (Int, Int) => Int | (Int, _) => error (“e2 should be int”) | (_, Int) => error (“e1 should be int”) | _ => error (“should both be int”) end

venv| e1: int venv| e2: int

venv| e1+e2: int

Elaborate Types

Elaborating types is straightforward, except for recursive types

Need to do “knot-tying”: extend tenv with bindings for all of the ne

w type names bind new names to “dummy” bodies

process each definition, replacing the dummy bodies with real definitions

Elaborate Declarations

elabDec will extend the symbol tables with a new binding:int a;

will add {a: int} to the environment. Remember that environments have to

take into account scope of variables!

Elaborate Statement, Lvals, Programs All follow the same structures as exp o

r types elabProg calls the other functions in o

rder to type-check each component of the program (declarations, statements, expressions, …)

Labs

For lab #4, your job is to implement an elaborator for C-- you may go in two steps

first type-checking and then generating target code

At every step, check the output carefully to make sure your compiler works correctly

Summary

Elaboration checks the well-formedness of programs must take care of semantics of source pr

ograms and may translate into more low-level for

ms Usually the most big (complex) part in

a compiler!

elaboration or: semantic analysis compiler baojian hua [email protected]

Documents

extern int x x

static int x x

int slide

scope int x int f

lifetime static int

int main

space slide

binding slide