1 symbol tables the symbol table contains information about –variables –functions –class names...

1

Symbol Tables

• The symbol table contains information about– variables– functions– class names– type names– temporary variables– etc.

2

Symbol Tables

• What kind of information is usually stored in a symbol table?– type– storage class– size– scope– stack frame offset– register

3

Symbol Tables

• How is a symbol table implemented?– array

• simple, but linear LookUp time• However, we may use a sorted array for reserved

words, since they are generally few and known in advance.

– tree• O(lgn) lookup time if kept balanced

– hash table• most common implementation• O(1) LookUp time

4

Symbol Tables

• Hash tables– use array of size m to store elements– given key k (the identifier name), use a function

h to compute index h(k) for that key– collisions are possible

• two keys hash into the same slot.

• Hash functions– A good hash function

• is easy to compute• avoids collisions (by breaking up patterns in the keys

and uniformly distributing the hash values)

5

Symbol Tables

• Hash functions– A common hash function is

h(k) = m*(k*c- k*c), for some constant 0<c<1– In English

• multiply the key k by the constant c• Take the fractional part of k*c• Multiply that by size m• Take the floor of the result

– A good value for c:

2

15

6

Resolving collisions

• Chaining– Put all the elements that collide in a chain

(list) attached to the slot.

• Insert/Delete/Lookup in expected O(1) time

• However, this assumes that the chains are kept small. – If the chains start becoming too long, the

table must be enlarged and all the keys rehashed.

7


• Open addressing– Store all elements within the table

• The space we save from the chain pointers is used up to make the array larger.

– If there is a collision, probe the table in a systematic way to find an empty slot.

– If the table fills up, we need to enlarge it and rehash all the keys.

• Open addressing with linear probing– Probe the slots in a linear manner– Simple but Bad: results in clustering (long

sequences of used slots build up very fast)

8


• Open addressing with double hashing– Use a second hash function. – The probe sequence is:

(h(k) + i*h2(k) ) mod m, with i=0, 1, 2, ...

– Good performance• Since we use a second function, keys that

originally collide will subsequently have different probe sequences.

– No clustering

– A good choice for h2(k) is p-(k mod p) where p is a prime less than m

9

Scope issues

• Block-structured languages allow nested name scopes.

• Usual visibility rules– Only names created in the current or

enclosing scopes are visible– When there is a conflict, the innermost

declaration takes precedence.

10

Scope issues

• One idea is to have a global hash table and save the scope information for each entry.

• When an identifier goes out of scope, scan the table and remove the corresponding entries– We may even link all same-scope entries together for

easier removal.

• Careful: deleting from a hash table that uses open addressing is tricky– We must mark a slot as Deleted, rather than Empty,

otherwise later LookUp operations may fail.

11

Scope issues

• Another idea is to maintain a separate, local hash table for each scope.

• We may store the tables in a tree or a stack (that mirrors the stack frames).

12

Structure tables

• Where should we store struct field names?– Separate mini symbol table for each struct

• Conceptually easy

– Separate table for all struct field names• We need to somehow uniquely map each name to its

structure (e.g. by concatenating the field name with the struct name)

– No special storage• struct field names are stored in the regular symbol

table. • Again we need to be able to map each name to its

structure.

1 symbol tables the symbol table contains information about –variables –functions –class names...

Documents