symbols and type-checking cpsc 388 ellen walker hiram college

Symbols and Type-Checking

CPSC 388

Ellen Walker

Hiram College

Symbol Table is Central

• For scanning & parsing– Distinguish “identifier” vs. “keyword”– Tree “decorations” during parsing

• For semantic analysis– Insertions / deletions from declarations / end of

scope– Type-checking and making sure variables are

declared

• Code generation– Associating addresses and/or labels with symbols

Symbol Table

• Dictionary Data Structure (Hash table)– Insert / Lookup / Delete

• Search key is symbol name

• Additional attributes in node class (struct)– E.g. const (value), type, function/variable

A Note on Hash Functions

• Hash function should use complete name (all characters)– Avoid collisions with “temp1, temp2, temp3…”

• Include character positions (don’t simply add up characters)– Avoid collisions “tempx” vs. “xtemp”

• Use mod function often to avoid overflow– (a+b)%m = (a%m + b%m)%m

Symbol Types & Attributes

• Constant– final int SIZE = 199– (constant, type=int, value=199)

• Variable– int a;– (variable, type=int)

More Types & Attributes

• Structure– struct Entry{char *name; int count}– (structure, size=64bits)

• Function– int myFun(char *foo, int bar)– (function, 2 parameters, char* + int)

Declaration Before Use

• Every symbol is declared before its first use– Declaration inserts all attributes into

symbol table

• Look up new “id” in table– If declared, all attributes available– Else compilation error– Allows for “one-pass” compilation

Implicit Declaration

• Symbols are inserted into table when first seen– Default attributes (e.g. C function returns

int, Fortran variable type chosen by first letter)

– Attributes determined by use (e.g. lhs of assignment gets type of rhs of assignment)

Scope / Lifetime of a Symbol

• Scope: where is symbol visible?– Global– Within function– Within block ({…})

• Lifetime: when is memory allocated?– Static: from declaration on– Automatic: only when visible– Dynamic: explicit alloc/dealloc (run-time only)

Scope / Lifetime Example (C++)

int x; //x is global, automaticint count(){ static t = 0; //t is local to count, static t++; }void main(){cin >>x; for(int i=0;i<x;i++) //i is local to for, automatic count();}}

Nested Blocks

procedure A {

int x //visible in A but not B

int y //visible in A and B

procedure B {

int x //visible in B only

…

}

}

Nested Blocks in Symbol Table

• When variable becomes visible, insert into symbol table – Before any other variable with same name– Innermost visible variable “shadows” all

others

• When variable is no longer visible, delete– Outer value uncovered

Implementations

• Sorted list– New key must precede equal keys, stop at first

match

• Binary Tree– Always go left on equal, stop at NULL left child

• Hash table– Insert at beginning of collision list, stop at first

match

Explicit Scope Operator

• Some languages provide an explicit scope operator, eg.

String::last(“abc”) //don’t use a local last fn

• To implement, each symbol needs a block id– E.g. name of enclosing function or class

Same-Level Duplicates

• Disallowed in most languages– Look up symbol before adding – If symbol is in current block, error– Requires block id (or equivalent) in symbol

table

• Later value would shadow earlier value– Compiler implementation same as nesting– Code is very confusing!

Sequential Evaluation?

int i = 5;

{

int i = 7;

int j = 1+i; // j=8 if sequential,

… // j=6 if collateral (parallel)

}

• Collateral implementation might be more efficient (ML, LISP)

Recursive Declaration

int factorial(int x){ //recursive function

if (x>0) return x*factorial(x-1);

else return x;

}

Class node{ //recursive data structure

int value;

node * next;

}

Implementing Recursive Declarations

• Get name into symbol table as soon as possible– Before finishing function or structure– E.g. decl: name ( args ) {/*update symtab*/}

statement-block {/*generate code*/}

• Once symbol is in table, it’s ok to use– Using a symbol is not re-declaring it!

• Prototype also gets name into symbol table

Mutual Recursion & Prototypes

int B(int x); //Prototype for B

int A(int x){ //Calls B

//B already in symbol table from prototype

if (x>0) return B(x-1);

}

int B(int x){ //Calls A

if (x!=1) return A(x/2);

}

Declaration Example (p. 311)

• let declarations w/ initialization in exp – let x=3,y=5 in z=x+y– let x=3 in (let x=5 in y=x+1)

• Attributes (for creating symbol tables)– symtab Current symbol table– nestl Current nesting level– err Boolean - is it an error?– intab/outtab Tables before/after declaration

Declaration Attribute Rules

S-> exp //initialization & finalization

exp.symtab = emptytable

exp.nestlevel = 0

S.err = exp.err

exp -> id //id must be in symbol table

exp.err = not isin(exp.symtab,id.name)

Initialization Attribute Rule

decl->id=exp

exp.symtab = decl.intab //current symbols

exp.nestl = decl.nestl //current nest level

decl.outtab = //output table w/ new id

if(decl.intab == errtab) || exp.err ||

lookup(decl.intab, id.name) == decl.nestl then errtab

else insert(decl.intab, id.name, decl.nestl)

Let Statement Attribute Rule

exp1 -> let dec-list in exp2

dec-list.intab = exp1.symtab

dec-list.nestl = exp1.nestl + 1 //nesting

exp2.symtab = dec-list.outtab

exp2.nestl = dec-list.nestl

exp1.err = (dec-list.outtab == errtab) ||

exp2.err

Data Types - Definitions

• Type– Class of possible values (w/operations)

• Type inference– Determine result type based on input types

• Type checking– Ensure specified types make sense

• Assignment statements• Function calls (parameters)

Simple Data Types

• Built-in (predefined) – Directly represented in memory (e.g. int,

float, double)

• Programmer-defined– Subrange (e.g. 1..10)– Enumerated (e.g. {SU, FA, SP})

Type Constructors

• Array– Sequence of elements of the same type– One type, explicit size

• Record / Struct– Collection of elements of varied types– Many types, implicit size

• Union– Choice of types, implicit size (largest one)

More Type Constructors

• Pointer / reference– Address of an object of given type – “Dereference” operation follows the pointer– Reference is automatically dereferenced

• Function– Maps parameters (of given types) to return

value (of given type)

And finally...

• Class– Struct + member functions (methods)– Information hiding (public/private)– Inheritance– Polymorphism

Type Names

• Define a name to represent a type– typedef hand = array[1..5] of card– typedef vector<int>::iterator iter

• Programmer convenience

• Another kind of symbol for the symbol table!

Types are structurally equivalent when they are...

• Simple and identical

• Arrays of the same size and equivalent element type

• Structures of equivalent type elements in the same sequence– Assume equivalence for recursive tests!

• Pointers to items of equivalent types

Other kinds of equivalence

• Name equivalence– Names must exactly match– More restrictive than structural

• Declaration equivalence– Types match if names are the same or …– Types X and Y match if “X=Y” is explicitly

declared in the code

Type Inference

• Declarations cause type of an id to be entered into a symbol tableVar-decl-> id: type-exp

insert(id.name, type-exp.type)

//associate type to id in the symbol table

• Assume an array, struct, type has pointers to its parts

Type Checking (p. 330)

stmt -> id := expIf not (typeEqual (lookup(id.name),exp.type)

type-error(stmt)

stmt -> if exp then stmtIf not (typeEqual (exp.type, boolean)

type-error(stmt)

Array Type Inference / Checking

type-exp1 -> array [num] of type-exp2Type-exp1.type =

new typenode (“array”, num.size, type-exp2.type)

exp1 -> exp2 [exp3]if (isArrayType(exp2.type) &&

typeEqual(exp3.type, integer))

exp1.type = exp2.type.child1

else type-error(exp1)

Overloading

• Interpretation of a symbol depends on types of related subexpressions– 5.0 + 6.0 vs. “mystring: “ + “abc”– int max (int A[]) vs. double max(double A[])

• Type attributes from symbol table needed to understand (gen. code for)– a+b– c = max(a,b)

Type Conversion

• Type “upgrades” in mixed expressions – float + int -> float

• Add rules to grammar – Type of expression is checked after each

subexpression– If subexpression is “bigger”, upgrade

expression type

Type Conversion in Assignment

• Can be “upgrade” or “downgrade”– double x = 1+2; //upgraded from int– int z = 5 / 2.0 //2 (info loss!)

• Rule sets LHS type from declaration regardless of expression type

• Coercion code must be compiled in• (Language designer’s decision whether

compiler will do this)

OO Type Conversion

• “upgrade” = assignment of superclass to subclass

• “downgrade” = assignment of subclass to superclass (with loss of info)

• Very general algorithms exist, but are implemented in few languages

Result of Semantic Analysis

• Complete symbol table(s) with attributes– Incorporating scoping rules

• Additional attributes for grammar non-terminals – (mostly for building symbol tables)

• Determination whether semantic errors have occurred (and where)

Semantic Errors

• Undeclared symbol (in this scope)

• Multiple declarations (in this scope)

• Invalid type for statement – E.g. if (“not boolean”) …

• Incompatible types in assignment

• Incompatible types in function call / no overload available

Attributes of a type

• Name (the symbol in the table)

• Size (number of bytes taken up)

• Type expression– Array element type and size– Structure components– Union alternative types

symbols and type-checking cpsc 388 ellen walker hiram college

Documents