1 symbol tables the symbol table contains information about –variables –functions –class names...
TRANSCRIPT
1
Symbol Tables
• The symbol table contains information about– variables– functions– class names– type names– temporary variables– etc.
2
Symbol Tables
• What kind of information is usually stored in a symbol table?– type– storage class– size– scope– stack frame offset– register
3
Symbol Tables
• How is a symbol table implemented?– array
• simple, but linear LookUp time• However, we may use a sorted array for reserved
words, since they are generally few and known in advance.
– tree• O(lgn) lookup time if kept balanced
– hash table• most common implementation• O(1) LookUp time
4
Symbol Tables
• Hash tables– use array of size m to store elements– given key k (the identifier name), use a function
h to compute index h(k) for that key– collisions are possible
• two keys hash into the same slot.
• Hash functions– A good hash function
• is easy to compute• avoids collisions (by breaking up patterns in the keys
and uniformly distributing the hash values)
5
Symbol Tables
• Hash functions– A common hash function is
h(k) = m*(k*c- k*c), for some constant 0<c<1– In English
• multiply the key k by the constant c• Take the fractional part of k*c• Multiply that by size m• Take the floor of the result
– A good value for c:
2
15
6
Resolving collisions
• Chaining– Put all the elements that collide in a chain
(list) attached to the slot.
• Insert/Delete/Lookup in expected O(1) time
• However, this assumes that the chains are kept small. – If the chains start becoming too long, the
table must be enlarged and all the keys rehashed.
7
Resolving collisions
• Open addressing– Store all elements within the table
• The space we save from the chain pointers is used up to make the array larger.
– If there is a collision, probe the table in a systematic way to find an empty slot.
– If the table fills up, we need to enlarge it and rehash all the keys.
• Open addressing with linear probing– Probe the slots in a linear manner– Simple but Bad: results in clustering (long
sequences of used slots build up very fast)
8
Resolving collisions
• Open addressing with double hashing– Use a second hash function. – The probe sequence is:
(h(k) + i*h2(k) ) mod m, with i=0, 1, 2, ...
– Good performance• Since we use a second function, keys that
originally collide will subsequently have different probe sequences.
– No clustering
– A good choice for h2(k) is p-(k mod p) where p is a prime less than m
9
Scope issues
• Block-structured languages allow nested name scopes.
• Usual visibility rules– Only names created in the current or
enclosing scopes are visible– When there is a conflict, the innermost
declaration takes precedence.
10
Scope issues
• One idea is to have a global hash table and save the scope information for each entry.
• When an identifier goes out of scope, scan the table and remove the corresponding entries– We may even link all same-scope entries together for
easier removal.
• Careful: deleting from a hash table that uses open addressing is tricky– We must mark a slot as Deleted, rather than Empty,
otherwise later LookUp operations may fail.
11
Scope issues
• Another idea is to maintain a separate, local hash table for each scope.
• We may store the tables in a tree or a stack (that mirrors the stack frames).
12
Structure tables
• Where should we store struct field names?– Separate mini symbol table for each struct
• Conceptually easy
– Separate table for all struct field names• We need to somehow uniquely map each name to its
structure (e.g. by concatenating the field name with the struct name)
– No special storage• struct field names are stored in the regular symbol
table. • Again we need to be able to map each name to its
structure.