lecture 8 semantic analysis

69
Lecture 8 Semantic Analysis

Upload: scot-barrett

Post on 18-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

The Compiler So Far Lexical analysis Detects inputs with illegal tokens Parsing Detects inputs with ill-formed parse trees Semantic analysis Last “front end” phase Catches all remaining errors

TRANSCRIPT

Page 1: Lecture 8 Semantic Analysis

1

Lecture 8

Semantic Analysis

Page 2: Lecture 8 Semantic Analysis

2

The Compiler So Far Lexical analysis

» Detects inputs with illegal tokens

Parsing» Detects inputs with ill-formed parse trees

Semantic analysis» Last “front end” phase» Catches all remaining errors

Page 3: Lecture 8 Semantic Analysis

3

What’s Wrong? Example 1

int y = x + 3;

Example 2

String y = “abc” ; y ++ ;

Page 4: Lecture 8 Semantic Analysis

4

Why a Separate Semantic Analysis?

Parsing cannot catch some errors

Some language constructs are not context-free» Example: All used variables must have been

declared (i.e. scoping)» ex: { int x { .. { .. x ..} ..} ..}» Example: A method must be invoked with

arguments of proper type (i.e. typing)» ex: int f(int, int) {…} called by f(‘a’, 2.3, 1)

Page 5: Lecture 8 Semantic Analysis

5

More problems require semantic analysis

1. Is x a scalar, an array, or a function?2. Is x declared before it is used? 3. Is x defined before it is used?4. Are any names declared but not used?5. Which declaration of x does this reference?6. Is an expression type-consistent?7. Does the dimension of a reference match the

declaration?8. Where can x be stored? (heap, stack, . . . )9. Does *p reference the result of a malloc()?10. Is an array reference in bounds?11. Does function foo produce a constant value?

Page 6: Lecture 8 Semantic Analysis

6

Why is semantic analysis hard?

need non-local information answers depend on values, not on syntax answers may involve computation

Page 7: Lecture 8 Semantic Analysis

7

How can we answer these questions?

1. use context-sensitive grammars (CSG)» general problem is P-space complete

2. use attribute grammars(AG)» augment context-free grammar with rules» calculate attributes for grammar symbols

3. use ad hoc techniques» augment grammar with arbitrary code» execute code at corresponding reduction» store information in attributes, symbol

tables

Page 8: Lecture 8 Semantic Analysis

8

Attribute Grammars A generalization of CFG An attribute grammar is a context free grammar

with associated attributes and semantic rules Each grammar symbol is associated with a set of

attributes Each production is associated with a set of

semantic rules for computing attributes

Also called Syntax-directed Definition in Dragon book.

Page 9: Lecture 8 Semantic Analysis

9

Dependences between attributes

Attribute values are computed from constants & other attributes

synthesized attribute » value computed from children [&

constants] inherited attribute

» value computed from siblings & parent [& constants ]

induce dependency graph among attributes of parse tree nodes.

Page 10: Lecture 8 Semantic Analysis

10

Synthesized Attributes

Page 11: Lecture 8 Semantic Analysis

11

Synthesized Attributes

Page 12: Lecture 8 Semantic Analysis

12

Inherited Attributes

Page 13: Lecture 8 Semantic Analysis

13

Inherited Attributes

Page 14: Lecture 8 Semantic Analysis

14

Example attribute grammarA grammar to evaluate signed

binary numbers Production Evaluation Rules1 NUM SIGN LIST LIST.pos = 0 NUM.val = SIGN.neg ? -LIST.val : LIST.val2 SIGN + SIGN.neg = false3 SIGN - SIGN.neg = true4 LIST BIT BIT.pos = LIST.pos LIST.val = BIT.val5 LIST LIST BIT LIST1.pos = LIST0.pos + 1 BIT.pos = LIST0.pos LIST0.val = LIST1.val + BIT.val6 BIT 0 BIT.val = 07 BIT 1 BIT.val = 2BIT.pos

Page 15: Lecture 8 Semantic Analysis

15

Annotated Parse Tree

Page 16: Lecture 8 Semantic Analysis

16

Annotated Parse Treeand Dependency graph

• val and neg are synthesized attributes• pos is an inherited attribute

Page 17: Lecture 8 Semantic Analysis

17

Two Notations for AGs Syntax-Directed Definition Syntax-directed Translation [Scheme]

Page 18: Lecture 8 Semantic Analysis

18

Syntax-directed definition Each grammar production Ais associated with a

set of semantic rules of the form b := f (c1, c2, …, ck)where f is a function and1. b is a synthesized attribute of A and c1, c2, …, ck are

attributes of grammar symbols in , or2. b is an inherited attribute of one of the grammar

symbols in and c1, c2, …, ck are attributes of A or grammar symbols in

Page 19: Lecture 8 Semantic Analysis

19

Dependencies of Attributes• In the semantic ruleb := f(c1, c2, …, ck)we say b depends on c1, c2, …, ck• The semantic rule b must be evaluated after the

semantic rules for c1, c2, …, ck• The dependencies of attributes can be represented by a

directed graph called dependency graph

Page 20: Lecture 8 Semantic Analysis

20

Dependency Graph

Page 21: Lecture 8 Semantic Analysis

21

Evaluatino Order Apply topological sort on dependency graph

a1 := floata2 := a1addtype(a3, a2) /* a4 */a5 := a2addtype(a6, a5) /* a7 */a8 := a5addtype(a9, a8) /* a10 */

Page 22: Lecture 8 Semantic Analysis

22

S-attributed Grammar S-attribute Grammar:

» all attributes are synthesized attributes.» can be evaluated in one-pass» useful in many context. (e.g. Calculator)» A good math to LR parsing.

Page 23: Lecture 8 Semantic Analysis

23

Example

Page 24: Lecture 8 Semantic Analysis

24

L-attributed Grammar A syntax-directed definition is L-attributed if each

attribute in each semantic rule for each production A X1 X2 … Xn is a synthesized attribute, or an inherited attribute of Xj, 1 j n, depending only on 1. the attributes of X1, X2, …, Xj-1 and/or

2. the inherited attributes of A

Page 25: Lecture 8 Semantic Analysis

25

Example

Page 26: Lecture 8 Semantic Analysis

26

Counter Example

Page 27: Lecture 8 Semantic Analysis

27

pros and cons of AGs advantages:

» clean formalism» automatic generation of evaluator» high-level specification.

Disadvantages» efficiency determined by evaluation

strategy» increase space requirement» circular testing» handling non-local information

Page 28: Lecture 8 Semantic Analysis

28

Syntax-directed Translation[scheme]

allow arbitrary actions can have global data structure can place actions among production

Page 29: Lecture 8 Semantic Analysis

29

The definition A translation scheme is an attribute grammar in which

semantic rules are enclosed between braces { and }, and are inserted within the right sides of productions

The value of an attribute must be available when a semantic rule refers to it

Page 30: Lecture 8 Semantic Analysis

30

Examples» YACC » A ::= B C { $$ = concat($1,$2); }» CUP» A ::= B:m C:p { RESULT = concat(m,p); }

Typical uses» build abstract syntax tree & symbol table» perform error/type checking

Page 31: Lecture 8 Semantic Analysis

31

Example D T {L.in := T.type} L T int {T.type := integer} T float {T.type := float} L {L1.in := L.in} L1 ‘,’ id {addtype(id.entry,

L.in)} L id {addtype(id.entry, L.in)}

Page 32: Lecture 8 Semantic Analysis

32

Example

Page 33: Lecture 8 Semantic Analysis

33

Restriction on translation scheme

An inherited attribute for a symbol on the right side must be computed in a semantic rule before that symbol

A semantic rule must not refer to a synthesized attribute for a symbol to its right

A synthesized attribute for the symbol on the left can be computed after all attributes it depends on have been computed

Page 34: Lecture 8 Semantic Analysis

34

From L-Attributed Definitionsto Translation Schemes

Page 35: Lecture 8 Semantic Analysis

35

From L-Attributed Definitionsto Translation Schemes

S {B.ps := 10} B {S.ht := B.ht} B {B1.ps := B.ps} B1 {B2.ps := B.ps} B2 {B.ht :=

max(B1.ht, B2.ht)} B {B1.ps := B.ps} B1 sub {B2.ps := shrink(B.ps)}

B2 {B.ht := disp(B1.ht, B2.ht)} B text {B.ht := text.ht B.ps}

Page 36: Lecture 8 Semantic Analysis

36

Abstract syntax tree An abstract syntax tree is a condensed form of parse tree

useful for representing constructs.» usually a parse tree with the nodes for most

non-terminal symbols removed.

This represents “x - 2 * y”. can use a linearized (operator) form of the

tree. » x 2 y * - in postfix form.

A popular intermediate representation.

Page 37: Lecture 8 Semantic Analysis

37

Scope Matching identifier declarations with uses

» Important static analysis step in most languages

Page 38: Lecture 8 Semantic Analysis

38

Scope (Cont.) The scope of an identifier is the portion of a

program in which that identifier is accessible

The same identifier may refer to different things in different parts of the program» Different scopes for same name don’t overlap

An identifier may have restricted scope

Page 39: Lecture 8 Semantic Analysis

39

Static vs. Dynamic Scope Most languages have static scope

» Scope depends only on the program text, not run-time behavior

» C and Java have static scope

A few languages are dynamically scoped» Lisp, SNOBOL» Lisp has changed to mostly static scoping» Scope depends on execution of the program

Page 40: Lecture 8 Semantic Analysis

40

Static Scoping Example { int x = 0;

x++;{ int x = 1; x++; }x++;}

Page 41: Lecture 8 Semantic Analysis

41

Static Scoping Example (Cont.)

{ int x = 0;x++;{ int x = 1; x++; }x++;}

Uses of x refer to closest enclosing definition

Page 42: Lecture 8 Semantic Analysis

42

Dynamic Scope A dynamically-scoped variable refers to the

closest enclosing binding in the execution of the program

Example(defun g (y) (let (a 4) (f 3)));(defun f (x) a);» (g 5) => 4

Page 43: Lecture 8 Semantic Analysis

43

Symbol Tables Consider the block: B :{ int x = 0; Es} Idea:

» Before processing Es, add definition of x to current definitions, overriding any other definition of x

» After processing Es, remove definition of x and restore old definition of x

A symbol table is a data structure that tracks the current bindings of identifiers

Page 44: Lecture 8 Semantic Analysis

44

Symbol tables A symbol table associates values or

attributes (e.g., types and values) with names.

What should be in a symbol table?» variable and procedure names» literal constants and strings

Page 45: Lecture 8 Semantic Analysis

45

Symbol TablesWhat information might compiler need? textual name data type declaring procedure lexical level of declaration if array, number and size of dimensions if procedure, number and type of

parameters

Page 46: Lecture 8 Semantic Analysis

46

Symbol TablesImplementation

» usually implemented as hash tables

How to handle nested lexical scoping?» when we ask about a name, we want the closest

lexical declaration

One solution» use one symbol table per scope» tables chained to enclosing scopes» insert names in table for current scope» name lookup starts in current table if needed,

checks enclosing scopes in order

Page 47: Lecture 8 Semantic Analysis

47

B1

B2

TA

TB1

A

TB2

C

TC

scopes tables

Page 48: Lecture 8 Semantic Analysis

48

Imperative-style Symbol Table

beginScope() start a new nested scope endScope() exit current scope addSymbol(x, value) add a symbol x to the table Object lookup(x) finds current x (or null) checkScope(x) true if x defined in current scope (needed for checking duplicate declarations in the

same scope)

Page 49: Lecture 8 Semantic Analysis

49

class SymbolTableclass SymbolTabe { static class Entry { Symbol id; Object value; int scope; Entry(…){…} }private Map map = new HashMap(); private int curScope; …private Stack stack = new Stack();lookup(Symbol s) { Entry e = (Entry) map.get(s) ; if( e == null) return null; else return e.value; }checkScope(x) {

Entry e = (Entry) map.get(s) ; if( e == null) return null; else return e.scope = curScope; }beginScope() { curScope++; push(null); }endScope() { for( Symbol s = (Symbol) pop(); s != null ; s = (Symbol) pop() ) map.delete(s); curScope--; }addSymbol(Symbol x, Object v) { assert lookup(x) == null; push(x) ;

map.add(x,v); }……}

Page 50: Lecture 8 Semantic Analysis

50

Types What is a type?

» The notion varies from language to language

Consensus» A set of values» A set of operations on those values

Classes are one instantiation of the modern notion of type

Page 51: Lecture 8 Semantic Analysis

51

Why Do We Need Type Systems?

Consider the assembly language fragment

addi r1, r2, r3

What are the types of r1, r2, r3?

Page 52: Lecture 8 Semantic Analysis

52

Types and Operations Certain operations are legal for values of each

type

» It doesn’t make sense to add a function pointer and an integer in C

» It does make sense to add two integers

» But both have the same assembly language implementation!

Page 53: Lecture 8 Semantic Analysis

53

Type Systems A language’s type system specifies which

operations are valid for which types

The goal of type checking is to ensure that operations are used with the correct types» Enforces intended interpretation of values,

because nothing else will!

Type systems provide a concise formalization of the semantic checking rules

Page 54: Lecture 8 Semantic Analysis

54

What Can Types do For Us? Can detect certain kinds of errors :

» “abc” ++ ; x = ar[ “abc”] ; int x = “abc” ; Memory errors:

» Reading from an invalid pointer, etc.» int x[50] ; x[50] = 3;

expressiveness (overloading, polymorphism)» help determine which methods/constructors would be

invoked.» Ex: add(Complex, Complex), add(int,int),

add(String,String),..» add(23,14) => add(int, int) invoked

provide information for code generation » ex: memory size

Page 55: Lecture 8 Semantic Analysis

55

Type Checking Overview Three kinds of languages:

» Statically typed: All or almost all checking of types is done as part of compilation (C, Java, Cool)

» Dynamically typed: Almost all checking of types is done as part of program execution (Scheme)

» Untyped: No type checking (machine code)

Page 56: Lecture 8 Semantic Analysis

56

Pros and cons Static typing:

» catches many programming errors at compile time

» Avoids overhead of runtime type checks Dynamic typing:

» Static type systems are restrictive» Rapid prototyping easier in a dynamic type

system

Page 57: Lecture 8 Semantic Analysis

57

The practice Most code is written in statically typed languages

with an “escape” mechanism» Unsafe casts in C, Java» (Person) malloc(1024) // c» Object x = …;» Person p = (Person) x; // Java

It’s debatable whether this compromise represents the best or worst of both worlds

Page 58: Lecture 8 Semantic Analysis

58

Type systems Types

» A type is a set of values that share a set of common properties.

» e.g. : int, InputStream, String, …» Defined by language (built-in types) and/or programmer

(user-defined types) Types are represented by type expressions Type system

» 1. A set of types in a programming language, and» 2. a collection of rules for assigning types to the» various parts of a program

A type checker enforces a type system on users’ programs

Page 59: Lecture 8 Semantic Analysis

59

Type Systems (continued) Example type rules

» E1 and E2 are of the type int ==> E1+E2 is of the type int.

» x = E ==> x and E must have the same type.

Page 60: Lecture 8 Semantic Analysis

60

Type checking Type checker

» enforces rules of type system» may be strong/weak, static/dynamic

Static type checking» performed at compile time» early detection, no run-time overhead» not always possible (e.g., A[i])

Dynamic type checking» performed at run time» more flexible, rapid prototyping» overhead to check run-time type tags

Page 61: Lecture 8 Semantic Analysis

61

Type expressions Type expressions (v.s. value expressions)

» used to represent the type of a language construct

» describes both language and programmer types

Examples» basic types: int, float, char, ...» constructed types: arrays, records,

pointers, » functions,...

Page 62: Lecture 8 Semantic Analysis

62

Type Expressions A basic type is a type expression

» boolean, char, integer, real, void, typeError A type constructor applied to type expressions is

a type expression» array: array(I, T)» record: T1 x T2 x …x Tn

» pointer: pointer(T)» function: D R

Page 63: Lecture 8 Semantic Analysis

63

Type Expressions Constructing new types, if T, T1,…, Tn are type

expressions, I an non-negative integer then» array(I, T) // array type» T1 x T2 x … x Tn // record type» pointer(T) // pointer» T1 x T2 x …x Tn T // funciton

are type expressions

Page 64: Lecture 8 Semantic Analysis

64

A simple type checker Using a synthesized attribute grammar, we will

describe a type checker for arrays, pointers, statements, and functions.

Grammar for source language:» P D ; S» D D ; D | id: T» T char | integer | array [num] of T | * T» E literal | num | id | E mod E | E[E] | E *» S id = E; | if E then S | while E do S | S ; S

Page 65: Lecture 8 Semantic Analysis

65

Basic types char, integer, typeError assume all arrays start at 1, e.g., array [256] of

char results in the type expression array(256,char)

* builds a pointer type, so * integer results in the type expression pointer(integer)

Page 66: Lecture 8 Semantic Analysis

66

Type Declarations D id: T { addType(id.entry, T.type) } T char { T.type = char } T integer { T.type = integer } T * T1 { T.type = pointer(T1.type) } T array [num] of T1

{ T.type = array(num.val, T1.type) }

Page 67: Lecture 8 Semantic Analysis

67

Type checking of expressions E literal { E.type = char } E num { E.type = int } E id { E.type = lookup(id.entry) } E E1 mod E2 { E.type = (E1.type == int and

E2.type == int ) ? int : typeError } E E1 [ E2 ] {E.type = ( E1.type == array(s, t)

and E2.type == int)? t : typeError} E * E1 {E.type := (E1.type == pointer(t))? t :typeError}

Page 68: Lecture 8 Semantic Analysis

68

Type checking of Statements P D “;” S S id “=” E {S.type = (lookup(id.entry) ==

E.type) ? void : typeError} S if E then S1 {S.type = (E.type == boolean)?

S1.type : typeError} S while E do S1 {S.type := (E.type ==

boolean) ? S1.type else typeError} S S1 “;” S2 {S.type := (S1.type == void and

S2.type == void)? void : typeError}

Page 69: Lecture 8 Semantic Analysis

69

Type Checking of Functions T T1 T2 {T.type := T1.type T2 .type}

E E1 ( E2 ) {E.type := (E1.type == (s t and E2.type == s )? t : typeError}