data structures in a compiler(46 slides)

46
Joseph Bergin 1/12/99 1 Data Layouts Data Structures For a Simple Compiler

Upload: vonga

Post on 10-Feb-2017

241 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 1

Data Layouts

Data Structures For a Simple Compiler

Page 2: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 2

Symbol Tables

Information about user defined names

Page 3: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 3

Symbol Table

● Symbol Tables are organized for fast lookup.È Items are typically entered once and

then looked up several times.È Hash Tables and Balanced Binary

Search Trees are commonly used.È Each record contains a ÒnameÓ

(symbol) and information describing it.

Page 4: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 4

Simple Hash Table

● Hasher translates ÒnameÓ into an integer in a fixed range- the hash value.

● Hash Value indexes into an array of lists.È Entry with that symbol is in that list

or is not stored at all.È Items with same hash value = bucket.

Page 5: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 5

Simple Hash Table

0

max

anObject

hasher

index buckets

Page 6: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 6

Self Organizing Hash Table

● Can achieve constant average time lookup if buckets have bounded average length.

● Can guarantee this if we periodically double number of hash buckets and re-hash all elements.È Can be done so as to minimize

movement of items.

Page 7: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 7

Self Organizing Hash Table

0

2 * max

newhasher

index

0

max

anObject

hasher

index

n n

n + max

Page 8: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 8

Balanced Binary Search Tree

● Binary search trees work if they are kept balanced.

● Can achieve logarithmic lookup time.

● Algorithms are somewhat complex.È Red-black trees and AVL trees are

used. È No leaf is much farther from root

than any other

Page 9: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 9

Balanced Binary Search Tree

Page 10: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 10

Symbol Tables + Blocks

● If a language is block structured then each block (scope) needs to be represented separately in the symbol table.

● If the hash table buckets are Òstack-likeÓ this is automatic.

● Can use a stack of balanced trees with one entry per scope.

Page 11: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 11

Special Cases

● Some languages partition names into different classes- keywords, variable&function names, struct names.. .

● Separate symbol tables can then be used for each kind of name. The different symbol tables might have different characteristics. È hashtable-sortedlist-binarytree.. .

Page 12: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 12

Parsing Information

Page 13: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 13

Parse Trees

● The structure of a modern computer language is tree-like

● Trees represent recursion well. ● A gramatical structure is a node

with its parts as child nodes.● Interior nodes are nonterminals.● The tokens of the language are

leaves.

Page 14: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 14

Parse Trees

<statement> ::= <variable> Ò:=Ò <expression>x := a + 5

statement

variable := expression

x a + 5

Page 15: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 15

Parse Trees

● There are different node types in the same tree.

● Variant records or type unions are typically used. Object-orientation is also useful here.

● Each node has a tag that distinguishes it, permitting testing on node type.

Page 16: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 16

Parse Stack

● Parsing is often accomplished with a stack. (Not in this version of GCL)

● The stack holds values representing tokens, nonterminals and semantic symbols from the grammar.

Ð It can either hold what is expected next (LL parsing) or what has already been seen (LR parsing)

Page 17: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 17

Parse Stack

● A stack is used because most languages and their grammars are recursive. Stacks can accomplish much of what trees can.

● The contents of the stack are usually numeric encodings of the symbols for compactness of representation and speed of processing.

Page 18: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 18

Parse Stack

<statement> ::= <variable> Ò:=Ó <expression> #doAssign

max := max + 1;

<var>

Ò:=Ó

<expr>

#doAs

...

Example being scanned:

G rammar fragment

Page 19: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 19

Stack vs Parameters

● In recursive descent parsing, no stack is needed.

● This is because the semantic records can be passed directly to the semantic routines as parameters.

● Semantic records can also be returned from the parsing functions.

Page 20: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 20

Tokens

Information produced by the Scanner

Page 21: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 21

Token Records

● Token records pass information about symbols scanned. This varies by token type.

● Variant records or type unions are typically used.

● Each value contains a tag - the token type - and additional information.È The tag is usually an integer.

Page 22: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 22

Token Examples

● Simple tokens● No additional info● Only the tag field

È e n d N u m

● Others are more complex

● Tag plus other info

È numera lNumÈ 3 5

Page 23: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 23

Handling Strings

● Strings are variable length and therefore present some problems.

● In C we can allocate a free-store object to hold the spelling--BUT, allocation is expensive in time.

● In Pascal, allocating fixed length strings is wasteful.

● Spell buffers are an alternative.

Page 24: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 24

Strings in the Free Store

write ÒThe answer is: Ò, x;

The answer is:\0

The string is represented by the value of the pointer which can bepassed around the compiler.

strval = new char[16];

Page 25: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 25

Strings in a Spell Buffer

write ÒThe answer is: Ò, x;

before

N a m e T h e a n s w e r i s : 18

3 N a m e

after

The string is represented as (3,15) = (start, length)

Page 26: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 26

Semantic Information

Page 27: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 27

Semantic Information

● Parsing and semantic routines need to share information.

● This information can be passed as function parameters or a semantic stack can be used.

● There are different kinds of semantic information.È Variant Records/Type Unions/Objects

Page 28: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 28

Semantic Records

● Each record needs a tag to distinguish its kind. We need to test the tag types.

● Depending on the tag there will be additional information.

● Sometimes the additional information must itself be a tagged union/variant record.

Page 29: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 29

Simple Semantic Records

identifiermaximum

7

addoperator+

reloperator<=

ifentryJ35J36

Page 30: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 30

Complex Semantic Records

typeentry

integer2

exprentryconst

33

* see types (later)

exprentryvariable

0, 6false

Page 31: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 31

Semantic StackIn some compilers semantic recordsare stored in a semantic stack. In others, they are passed as parameters.

typeentryinteger

2

identifiermaximum

7

identifiervalue

5

stacktop

Page 32: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 32

Type Information

Page 33: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 33

Type Information

● Type information must be maintained for variables and parameters.

● There are different kinds of typesÈ Variant Records/Type Unions/Objects

● There are different typing rules in different languages. È Pointers to records/structs are a

simple representation.

Page 34: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 34

Type Information

● Types describe variables.

È size of a variable of this type(in bytes)È kind (tag)È additional information for some

types.

● There are also recursive types.

Page 35: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 35

Simple Types

integer2

Boolean2

The tag and the size are enough.

character1

Page 36: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 36

Tuple Type

[integer, Boolean]

tuple4

integer2

Boolean2

Page 37: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 37

Recursive Types

[integer, [integer, Boolean]]

tuple6

integer2

tuple4

...

...

Page 38: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 38

Range Types

integer range[1..10]

range2

1, 10...

integer2

Page 39: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 39

Array Types

Boolean array[1..10][0..4]

array100

1, 10

array10

0, 4

Boolean2

Page 40: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 40

Array Types (alternate)

Boolean array [range1] [range2]

array100

array10

Boolean2

range2

1, 10

range2

0, 4

integer2

integer2

Page 41: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 41

Record Typesrecord [integer x, boolean y ]

record4

x y

integer2

Boolean2

Note similarity to tuple types.

Page 42: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 42

Pointer Types

pointer [integer, Boolean]

tuple4

integer2

Boolean2

pointer2

Page 43: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 43

Procedure Types

integer2

proc2... Boolean

2

proc (integer, Boolean)

Note: Not all languages have procedure typeseven when they have procedures.

Page 44: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 44

Function Types

func (integer returns [integer, Boolean])

tuple4

integer2

Boolean2

integer2

func2...

Note: Not all languages have function typeseven when they have functions.

Page 45: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 45

Self Recursive TypesSome languages (Java, Modula-3) permit a type to reference itself:

class node{ int value;

node next;}

class8 value next

int4

The internal representation is a pointer (4 bytes)

Page 46: Data Structures in a Compiler(46 slides)

Joseph Bergin 1/12/99 46

Recursive Types Again

[ record [integer array[0..4] x, Boolean y] ,integer range [1..10] , pointer [integer, integer] ,func(integer, Boolean returns integer array[1..5])

]

Left as an exercise. :-)