symbols and type-checking cpsc 388 ellen walker hiram college
TRANSCRIPT
Symbols and Type-Checking
CPSC 388
Ellen Walker
Hiram College
Symbol Table is Central
• For scanning & parsing– Distinguish “identifier” vs. “keyword”– Tree “decorations” during parsing
• For semantic analysis– Insertions / deletions from declarations / end of
scope– Type-checking and making sure variables are
declared
• Code generation– Associating addresses and/or labels with symbols
Symbol Table
• Dictionary Data Structure (Hash table)– Insert / Lookup / Delete
• Search key is symbol name
• Additional attributes in node class (struct)– E.g. const (value), type, function/variable
A Note on Hash Functions
• Hash function should use complete name (all characters)– Avoid collisions with “temp1, temp2, temp3…”
• Include character positions (don’t simply add up characters)– Avoid collisions “tempx” vs. “xtemp”
• Use mod function often to avoid overflow– (a+b)%m = (a%m + b%m)%m
Symbol Types & Attributes
• Constant– final int SIZE = 199– (constant, type=int, value=199)
• Variable– int a;– (variable, type=int)
More Types & Attributes
• Structure– struct Entry{char *name; int count}– (structure, size=64bits)
• Function– int myFun(char *foo, int bar)– (function, 2 parameters, char* + int)
Declaration Before Use
• Every symbol is declared before its first use– Declaration inserts all attributes into
symbol table
• Look up new “id” in table– If declared, all attributes available– Else compilation error– Allows for “one-pass” compilation
Implicit Declaration
• Symbols are inserted into table when first seen– Default attributes (e.g. C function returns
int, Fortran variable type chosen by first letter)
– Attributes determined by use (e.g. lhs of assignment gets type of rhs of assignment)
Scope / Lifetime of a Symbol
• Scope: where is symbol visible?– Global– Within function– Within block ({…})
• Lifetime: when is memory allocated?– Static: from declaration on– Automatic: only when visible– Dynamic: explicit alloc/dealloc (run-time only)
Scope / Lifetime Example (C++)
int x; //x is global, automaticint count(){ static t = 0; //t is local to count, static t++; }void main(){cin >>x; for(int i=0;i<x;i++) //i is local to for, automatic count();}}
Nested Blocks
procedure A {
int x //visible in A but not B
int y //visible in A and B
procedure B {
int x //visible in B only
…
}
}
Nested Blocks in Symbol Table
• When variable becomes visible, insert into symbol table – Before any other variable with same name– Innermost visible variable “shadows” all
others
• When variable is no longer visible, delete– Outer value uncovered
Implementations
• Sorted list– New key must precede equal keys, stop at first
match
• Binary Tree– Always go left on equal, stop at NULL left child
• Hash table– Insert at beginning of collision list, stop at first
match
Explicit Scope Operator
• Some languages provide an explicit scope operator, eg.
String::last(“abc”) //don’t use a local last fn
• To implement, each symbol needs a block id– E.g. name of enclosing function or class
Same-Level Duplicates
• Disallowed in most languages– Look up symbol before adding – If symbol is in current block, error– Requires block id (or equivalent) in symbol
table
• Later value would shadow earlier value– Compiler implementation same as nesting– Code is very confusing!
Sequential Evaluation?
int i = 5;
{
int i = 7;
int j = 1+i; // j=8 if sequential,
… // j=6 if collateral (parallel)
}
• Collateral implementation might be more efficient (ML, LISP)
Recursive Declaration
int factorial(int x){ //recursive function
if (x>0) return x*factorial(x-1);
else return x;
}
Class node{ //recursive data structure
int value;
node * next;
}
Implementing Recursive Declarations
• Get name into symbol table as soon as possible– Before finishing function or structure– E.g. decl: name ( args ) {/*update symtab*/}
statement-block {/*generate code*/}
• Once symbol is in table, it’s ok to use– Using a symbol is not re-declaring it!
• Prototype also gets name into symbol table
Mutual Recursion & Prototypes
int B(int x); //Prototype for B
int A(int x){ //Calls B
//B already in symbol table from prototype
if (x>0) return B(x-1);
}
int B(int x){ //Calls A
if (x!=1) return A(x/2);
}
Declaration Example (p. 311)
• let declarations w/ initialization in exp – let x=3,y=5 in z=x+y– let x=3 in (let x=5 in y=x+1)
• Attributes (for creating symbol tables)– symtab Current symbol table– nestl Current nesting level– err Boolean - is it an error?– intab/outtab Tables before/after declaration
Declaration Attribute Rules
S-> exp //initialization & finalization
exp.symtab = emptytable
exp.nestlevel = 0
S.err = exp.err
exp -> id //id must be in symbol table
exp.err = not isin(exp.symtab,id.name)
Initialization Attribute Rule
decl->id=exp
exp.symtab = decl.intab //current symbols
exp.nestl = decl.nestl //current nest level
decl.outtab = //output table w/ new id
if(decl.intab == errtab) || exp.err ||
lookup(decl.intab, id.name) == decl.nestl then errtab
else insert(decl.intab, id.name, decl.nestl)
Let Statement Attribute Rule
exp1 -> let dec-list in exp2
dec-list.intab = exp1.symtab
dec-list.nestl = exp1.nestl + 1 //nesting
exp2.symtab = dec-list.outtab
exp2.nestl = dec-list.nestl
exp1.err = (dec-list.outtab == errtab) ||
exp2.err
Data Types - Definitions
• Type– Class of possible values (w/operations)
• Type inference– Determine result type based on input types
• Type checking– Ensure specified types make sense
• Assignment statements• Function calls (parameters)
Simple Data Types
• Built-in (predefined) – Directly represented in memory (e.g. int,
float, double)
• Programmer-defined– Subrange (e.g. 1..10)– Enumerated (e.g. {SU, FA, SP})
Type Constructors
• Array– Sequence of elements of the same type– One type, explicit size
• Record / Struct– Collection of elements of varied types– Many types, implicit size
• Union– Choice of types, implicit size (largest one)
More Type Constructors
• Pointer / reference– Address of an object of given type – “Dereference” operation follows the pointer– Reference is automatically dereferenced
• Function– Maps parameters (of given types) to return
value (of given type)
And finally...
• Class– Struct + member functions (methods)– Information hiding (public/private)– Inheritance– Polymorphism
Type Names
• Define a name to represent a type– typedef hand = array[1..5] of card– typedef vector<int>::iterator iter
• Programmer convenience
• Another kind of symbol for the symbol table!
Types are structurally equivalent when they are...
• Simple and identical
• Arrays of the same size and equivalent element type
• Structures of equivalent type elements in the same sequence– Assume equivalence for recursive tests!
• Pointers to items of equivalent types
Other kinds of equivalence
• Name equivalence– Names must exactly match– More restrictive than structural
• Declaration equivalence– Types match if names are the same or …– Types X and Y match if “X=Y” is explicitly
declared in the code
Type Inference
• Declarations cause type of an id to be entered into a symbol tableVar-decl-> id: type-exp
insert(id.name, type-exp.type)
//associate type to id in the symbol table
• Assume an array, struct, type has pointers to its parts
Type Checking (p. 330)
stmt -> id := expIf not (typeEqual (lookup(id.name),exp.type)
type-error(stmt)
stmt -> if exp then stmtIf not (typeEqual (exp.type, boolean)
type-error(stmt)
Array Type Inference / Checking
type-exp1 -> array [num] of type-exp2Type-exp1.type =
new typenode (“array”, num.size, type-exp2.type)
exp1 -> exp2 [exp3]if (isArrayType(exp2.type) &&
typeEqual(exp3.type, integer))
exp1.type = exp2.type.child1
else type-error(exp1)
Overloading
• Interpretation of a symbol depends on types of related subexpressions– 5.0 + 6.0 vs. “mystring: “ + “abc”– int max (int A[]) vs. double max(double A[])
• Type attributes from symbol table needed to understand (gen. code for)– a+b– c = max(a,b)
Type Conversion
• Type “upgrades” in mixed expressions – float + int -> float
• Add rules to grammar – Type of expression is checked after each
subexpression– If subexpression is “bigger”, upgrade
expression type
Type Conversion in Assignment
• Can be “upgrade” or “downgrade”– double x = 1+2; //upgraded from int– int z = 5 / 2.0 //2 (info loss!)
• Rule sets LHS type from declaration regardless of expression type
• Coercion code must be compiled in• (Language designer’s decision whether
compiler will do this)
OO Type Conversion
• “upgrade” = assignment of superclass to subclass
• “downgrade” = assignment of subclass to superclass (with loss of info)
• Very general algorithms exist, but are implemented in few languages
Result of Semantic Analysis
• Complete symbol table(s) with attributes– Incorporating scoping rules
• Additional attributes for grammar non-terminals – (mostly for building symbol tables)
• Determination whether semantic errors have occurred (and where)
Semantic Errors
• Undeclared symbol (in this scope)
• Multiple declarations (in this scope)
• Invalid type for statement – E.g. if (“not boolean”) …
• Incompatible types in assignment
• Incompatible types in function call / no overload available
Attributes of a type
• Name (the symbol in the table)
• Size (number of bytes taken up)
• Type expression– Array element type and size– Structure components– Union alternative types