programming language concepts

90
1 Programming Language Concepts Chapter 3 Describing Syntax and Semantics

Upload: kellan

Post on 07-Jan-2016

60 views

Category:

Documents


3 download

DESCRIPTION

Programming Language Concepts. Chapter 3 Describing Syntax and Semantics. Review. Syntax is the description of which strings of symbols are meaningful expressions in a language It takes more than syntax to understand a language; need meaning ( semantics ) too Syntax is the entry point. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Programming Language Concepts

1

Programming Language Concepts

Chapter 3

Describing Syntax and Semantics

Page 2: Programming Language Concepts

2

Review

• Syntax is the description of which strings of symbols are meaningful expressions in a language

• It takes more than syntax to understand a language; need meaning (semantics) too

• Syntax is the entry point

Page 3: Programming Language Concepts

3

Features of a Good Syntax

• Readable• Writeable• Lack of ambiguity• Suggestive of correct meaning• Ease of translation

Page 4: Programming Language Concepts

4

Elements of Syntax

• Character set – typically ASCII• Keywords – usually reserved• Special constants – cannot be assigned to• Identifiers – can be assigned to• Operator symbols (+, -, *)• Delimiters (parenthesis, braces, brackets,)• Blanks (white space)

Page 5: Programming Language Concepts

5

Elements of Syntax

• Expressions• Type expressions• Declarations• Statements (in imperative languages)• Subprograms (subroutines)

Page 6: Programming Language Concepts

6

Elements of Syntax

• Modules• Interfaces• Classes (for object-oriented languages)• Libraries

Page 7: Programming Language Concepts

7

Grammars

• Grammars are formal descriptions of which strings over a given character set are in a particular language

• Language designers write grammar• Language implementers use grammar

to know what programs to accept• Language users use grammar to know

how to write legitimate programs

Page 8: Programming Language Concepts

8

Sample Grammar

• Language: Parenthesized sums of 0’s and 1’s

• <Sum> ::= 0 • <Sum >::= 1 • <Sum> ::= <Sum> + <Sum>• <Sum> ::= (<Sum>)

Page 9: Programming Language Concepts

9

BNF Grammars

• Start with a set of characters, a,b,c,…• We call these terminals

• Add a set of different characters, X,Y,Z,…• We call these non-terminals

• One special non-terminal S called start symbol

Page 10: Programming Language Concepts

10

BNF Grammars

• BNF rules (productions) have form

X ::= y

where X is any non-terminal and y is a string of terminals and non-terminals

• BNF grammar is a set of BNF rules such that every non-terminal appears on the left of some rule

Page 11: Programming Language Concepts

11

Sample Grammar

• Terminals: 0 1 + ( )• Non-terminals: <Sum>• Start symbol = <Sum>

<Sum> ::= 0 <Sum >::= 1 <Sum> ::= <Sum> + <Sum><Sum> ::= (<Sum>)Can be abbreviated as <Sum> ::= 0 | 1 | <Sum> + <Sum> | (<Sum>)

Page 12: Programming Language Concepts

12

BNF Derivations

• Given rules

X::= yZw and Z::=v

we may replace Z by v to say

X => yZw => yvw

Page 13: Programming Language Concepts

13

BNF Derivations

• Start with the start symbol:

<Sum> =>

Page 14: Programming Language Concepts

14

BNF Derivations

• Pick a non-terminal

<Sum> =>

Page 15: Programming Language Concepts

15

• Pick a rule and substitute:• <Sum> ::= <Sum> + <Sum>

<Sum> => <Sum> + <Sum >

BNF Derivations

Page 16: Programming Language Concepts

16

• Pick a non-terminal:

<Sum> => <Sum> + <Sum >

BNF Derivations

Page 17: Programming Language Concepts

17

• Pick a rule and substitute:• <Sum> ::= ( <Sum> )

<Sum> => <Sum> + <Sum >

=> ( <Sum> ) + <Sum>

BNF Derivations

Page 18: Programming Language Concepts

18

• Pick a non-terminal:

<Sum> => <Sum> + <Sum >

=> ( <Sum> ) + <Sum>

BNF Derivations

Page 19: Programming Language Concepts

19

• Pick a rule and substitute:• <Sum> ::= <Sum> + <Sum>

<Sum> => <Sum> + <Sum >

=> ( <Sum> ) + <Sum>

=> ( <Sum> + <Sum> ) + <Sum>

BNF Derivations

Page 20: Programming Language Concepts

20

• Pick a non-terminal:

<Sum> => <Sum> + <Sum >

=> ( <Sum> ) + <Sum>

=> ( <Sum> + <Sum> ) + <Sum>

BNF Derivations

Page 21: Programming Language Concepts

21

• Pick a rule and substitute:• <Sum >::= 1

<Sum> => <Sum> + <Sum >

=> ( <Sum> ) + <Sum>

=> ( <Sum> + <Sum> ) + <Sum>

=> ( <Sum> + 1 ) + <Sum>

BNF Derivations

Page 22: Programming Language Concepts

22

• Pick a non-terminal:

<Sum> => <Sum> + <Sum >

=> ( <Sum> ) + <Sum>

=> ( <Sum> + <Sum> ) + <Sum>

=> ( <Sum> + 1 ) + <Sum>

BNF Derivations

Page 23: Programming Language Concepts

23

• Pick a rule and substitute:• <Sum >::= 0

<Sum> => <Sum> + <Sum >

=> ( <Sum> ) + <Sum>

=> ( <Sum> + <Sum> ) + <Sum>

=> ( <Sum> + 1 ) + <Sum>

=> ( <Sum> + 1 ) + 0

BNF Derivations

Page 24: Programming Language Concepts

24

• Pick a non-terminal:

<Sum> => <Sum> + <Sum >

=> ( <Sum> ) + <Sum>

=> ( <Sum> + <Sum> ) + <Sum>

=> ( <Sum> + 1 ) + <Sum>

=> ( <Sum> + 1 ) + 0

BNF Derivations

Page 25: Programming Language Concepts

25

• Pick a rule and substitute• <Sum> ::= 0

<Sum> => <Sum> + <Sum >

=> ( <Sum> ) + <Sum>

=> ( <Sum> + <Sum> ) + <Sum>

=> ( <Sum> + 1 ) + <Sum>

=> ( <Sum> + 1 ) 0

=> ( 0 + 1 ) + 0

BNF Derivations

Page 26: Programming Language Concepts

26

• ( 0 + 1 ) + 0 is generated by grammar

<Sum> => <Sum> + <Sum >

=> ( <Sum> ) + <Sum>

=> ( <Sum> + <Sum> ) + <Sum>

=> ( <Sum> + 1 ) + <Sum>

=> ( <Sum> + 1 ) + 0

=> ( 0 + 1 ) + 0

BNF Derivations

Page 27: Programming Language Concepts

27

BNF Semantics

• The meaning of a BNF grammar is the set of all strings consisting only of terminals that can be derived from the Start symbol

Page 28: Programming Language Concepts

28

• Graphical representation of a derivation• Each node labeled with either a non-terminal

or a terminal• If node is labeled with a terminal, then it is a

leaf (no sub-trees)• If node is labeled with a terminal, then it has

one branch for each character in the right-hand side of rule used to substitute for it

Remember Parse Trees

Page 29: Programming Language Concepts

29

Example

• Consider grammar:

<exp> ::= <factor>

| <factor> + <factor>

<factor> ::= <bin>

| <bin> * <exp>

<bin> ::= 0 | 1

• Build parse tree for 1 * 1 + 0 as an <exp>

Page 30: Programming Language Concepts

30

Example cont.

• 1 * 1 + 0: <exp>

<exp> is the start symbol for this parse tree

Page 31: Programming Language Concepts

31

Example cont.

• 1 * 1 + 0: <exp>

<factor>

Use rule: <exp> ::= <factor>

Page 32: Programming Language Concepts

32

Example cont.

• 1 * 1 + 0: <exp>

<factor>

<bin> * <exp>

Use rule: <factor> ::= <bin> * <exp>

Page 33: Programming Language Concepts

33

Example cont.

• 1 * 1 + 0: <exp>

<factor>

<bin> * <exp>

1 <factor> + <factor>

Use rules: <bin> ::= 1 and

<exp> ::= <factor> + <factor>

Page 34: Programming Language Concepts

34

Example cont.

• 1 * 1 + 0: <exp>

<factor>

<bin> * <exp>

1 <factor> + <factor>

<bin> <bin>

Use rule: <factor> ::= <bin>

• 1 * 1 + 0: <exp>

<factor>

<bin> * <exp>

1 <factor> + <factor>

<bin> <bin>

Page 35: Programming Language Concepts

35

Example cont.

• 1 * 1 + 0: <exp>

<factor>

<bin> * <exp>

1 <factor> + <factor>

<bin> <bin>

1 0

Use rules: <bin> ::= 1 | 0

Page 36: Programming Language Concepts

36

Example cont.

• 1 * 1 + 0: <exp>

<factor>

<bin> * <exp>

1 <factor> + <factor>

<bin> <bin>

1 0Fringe of tree is string generated by grammar

Page 37: Programming Language Concepts

37

Where Syntax Meets Semantics

Page 38: Programming Language Concepts

38

Three “Equivalent” Grammars

G1: <subexp> ::= a | b | c | <subexp> - <subexp>

G2: <subexp> ::= <var> - <subexp> | <var><var> ::= a | b | c

G3: <subexp> ::= <subexp> - <var> | <var><var> ::= a | b | c

These grammars all define the same language: thelanguage of strings that contain one or more as, bs or cs separated by minus signs.

Page 39: Programming Language Concepts

39

<subexp>

<subexp> - <var>

a

<var>

<subexp> - <var>

b

c G3 parse tree:

<subexp>

<var> - <subexp>

c

<var>

<var> - <subexp>

b

a G2 parse tree:

Page 40: Programming Language Concepts

40

Why Parse Trees Matter

• We want the structure of the parse tree to correspond to the semantics of the string it generates

• This makes grammar design much harder: we’re interested in the structure of each parse tree, not just in the generated string

• Parse trees are where syntax meets semantics

Page 41: Programming Language Concepts

41

Outline

• Operators• Precedence• Associativity• Ambiguities• Cluttered grammars• Parse trees and EBNF• Abstract syntax trees

Page 42: Programming Language Concepts

42

Operators

• Special syntax for frequently-used simple operations like addition, subtraction, multiplication and division

• The word operator refers both to the token used to specify the operation (like + and *) and to the operation itself

• Usually predefined, but not always• Usually a single token, but not always

Page 43: Programming Language Concepts

43

Operator Terminology

• Operands are the inputs to an operator, like 1 and 2 in the expression 1+2

• Unary operators take one operand: -1• Binary operators take two: 1+2• Ternary operators take three: a?b:c

Page 44: Programming Language Concepts

44

More Operator Terminology

• In most programming languages, binary operators use an infix notation: a + b

• Sometimes you see prefix notation: + a b• Sometimes postfix notation: a b +• Unary operators, similarly:

• (Can’t be infix, of course)• Can be prefix, as in -1• Can be postfix, as in a++

Page 45: Programming Language Concepts

45

Outline

• Operators• Precedence• Associativity• Ambiguities• Cluttered grammars• Parse trees and EBNF• Abstract syntax trees

Page 46: Programming Language Concepts

46

Working Grammar

G4: <exp> ::= <exp> + <exp> | <exp> * <exp> | (<exp>) | a | b | c

This generates a language of arithmetic expressionsusing parentheses, the operators + and *, and thevariables a, b and c

Page 47: Programming Language Concepts

47

Issue #1: Precedence

Our grammar generates this tree for a+b*c. In this tree,the addition is performed before the multiplication,which is not the usual convention for operator precedence.

<exp>

<exp> * <exp>

a

<exp> + <exp>

b

c

Page 48: Programming Language Concepts

48

Operator Precedence

• Applies when the order of evaluation is not completely decided by parentheses

• Each operator has a precedence level, and those with higher precedence are performed before those with lower precedence, as if parenthesized

• Most languages put * at a higher precedence level than +, so that

a+b*c = a+(b*c)

Page 49: Programming Language Concepts

49

Precedence Examples

• C (15 levels of precedence—too many?)

• Pascal (5 levels—not enough?)

• Smalltalk (1 level for all binary operators)

a = b < c ? * p + b * c : 1 << d ()

a <= 0 or 100 <= a

a + b * c

Error!

Page 50: Programming Language Concepts

50

Precedence In The Grammar

To fix the precedence problem, we modify the grammar so that it is forced to put * below + in the parse tree.

G5: <exp> ::= <exp> + <exp> | <mulexp><mulexp> ::= <mulexp> * <mulexp>

| (<exp>) | a | b | c

G4: <exp> ::= <exp> + <exp> | <exp> * <exp> | (<exp>) | a | b | c

Page 51: Programming Language Concepts

51

Correct Precedence

Our new grammar generates this tree for a+b*c. It generates the same language as before, but no longer generates parse trees with incorrect precedence.

<exp>

<exp> + <exp>

c

<mulexp> * <mulexp>

b

a

G5 parse tree: <mulexp> <mulexp>

Page 52: Programming Language Concepts

52

Outline

• Operators• Precedence• Associativity• Ambiguities• Cluttered grammars• Parse trees and EBNF• Abstract syntax trees

Page 53: Programming Language Concepts

53

Issue #2: Associativity

Our grammar G5 generates both these trees for a+b+c. The first one is not the usual convention for operator associativity.

<exp>

<exp> + <exp>

c

<mulexp>

<exp> + <exp>

b

<mulexp> a

<mulexp>

<exp>

<exp> + <exp>

b

<mulexp>

<exp> + <exp>

a

<mulexp> c

<mulexp>

Page 54: Programming Language Concepts

54

Operator Associativity

• Applies when the order of evaluation is not decided by parentheses or by precedence

• Left-associative operators group left to right: a+b+c+d = ((a+b)+c)+d

• Right-associative operators group right to left: a+b+c+d = a+(b+(c+d))

• Most operators in most languages are left-associative, but there are exceptions

Page 55: Programming Language Concepts

55

Associativity Examples

• C

• ML

• Fortran

a<<b<<c — most operators are left-associative a=b=0 — right-associative (assignment)

3-2-1 — most operators are left-associative1::2::nil — right-associative (list builder)

a/b*c — most operators are left-associativea**b**c — right-associative (exponentiation)

Page 56: Programming Language Concepts

56

Associativity In The Grammar

To fix the associativity problem, we modify the grammar tomake trees of +s grow down to the left (and likewise for *s)

G5: <exp> ::= <exp> + <exp> | <mulexp><mulexp> ::= <mulexp> * <mulexp>

| (<exp>) | a | b | c

G6: <exp> ::= <exp> + <mulexp> | <mulexp><mulexp> ::= <mulexp> * <rootexp> | <rootexp><rootexp> ::= (<exp>)

| a | b | c

Page 57: Programming Language Concepts

57

Correct Associativity

Our new grammar generates this tree for a+b+c. It generates the same language as before, but no longer generates trees with incorrect associativity.

<exp>

<exp> + <mulexp>

b

<rootexp>

<exp> + <mulexp>

<mulexp> c

<rootexp>

a

<rootexp>

Page 58: Programming Language Concepts

58

Outline

• Operators• Precedence• Associativity• Ambiguities• Cluttered grammars• Parse trees and EBNF• Abstract syntax trees

Page 59: Programming Language Concepts

59

Ambiguous Grammars and Languages

• A BNF grammar is ambiguous if its language contains strings for which there is more than one parse tree

• If all BNF’s for a language are ambiguous then the language is inherently ambiguous

Page 60: Programming Language Concepts

60

Example: Ambiguous Grammar

• 0 + 1 + 0

<Sum> <Sum>

<Sum> + <Sum> <Sum> + <Sum>

<Sum> + <Sum> 0 0 <Sum> + <Sum>

0 1 1 0

Page 61: Programming Language Concepts

61

Example

• What is the result for:

3 + 4 * 5 + 6

Page 62: Programming Language Concepts

62

Example

• What is the result for:

3 + 4 * 5 + 6• Possible answers:

• 41 = ((3 + 4) * 5) + 6• 47 = 3 + (4 * (5 + 6))• 29 = (3 + (4 * 5)) + 6 = 3 + ((4 * 5) + 6)• 77 = (3 + 4) * (5 + 6)

Page 63: Programming Language Concepts

63

Example

• What is the value of:

7 – 5 – 2

Page 64: Programming Language Concepts

64

Example

• What is the value of:7 – 5 – 2

• Possible answers:• In Pascal, C++, SML assoc. left 7 – 5 – 2 = (7 – 5) – 2 = 0• In APL, associate to right 7 – 5 – 2 = 7 – (5 – 2) = 4

Page 65: Programming Language Concepts

65

Two Major Sources of Ambiguity

• Lack of determination of operator precedence

• Lack of determination of operator associatively

• Not the only sources of ambiguity

Page 66: Programming Language Concepts

66

Issue #3: Ambiguity

• G4 was ambiguous: it generated more than one parse tree for the same string

• Fixing the associativity and precedence problems eliminated all the ambiguity

• This is usually a good thing: the parse tree corresponds to the meaning of the program, and we don’t want ambiguity about that

• Not all ambiguity stems from confusion about precedence and associativity...

Page 67: Programming Language Concepts

67

Dangling Else In Grammars

<stmt> ::= <if-stmt> | s1 | s2<if-stmt> ::= if <expr> then <stmt> else <stmt> | if <expr> then <stmt><expr> ::= e1 | e2

This grammar has a classic “dangling-else ambiguity.” Thestatement we want derived is

if e1 then if e2 then s1 else s2

and the next slide shows two different parse trees for it...

Page 68: Programming Language Concepts

Most languages that havethis problem choose thisparse tree: else goes withnearest unmatched then

<if-stmt>

if <exp> then <stmt> else <stmt>

e1 s2

if <exp> then <stmt>

<if-stmt>

e2 s1

<if-stmt>

if <exp> then <stmt>

e1

s2

if <exp> then <stmt> else <stmt>

<if-stmt>

e2 s1

Page 69: Programming Language Concepts

69

Eliminating The Ambiguity

We want to insist that if this expands into an if, that if mustalready have its own else. First, we make a new non-terminal<full-stmt> that generates everything <stmt> generates, exceptthat it can’t generate if statements with no else:

<stmt> ::= <if-stmt> | s1 | s2<if-stmt> ::= if <expr> then <stmt> else <stmt> | if <expr> then <stmt><expr> ::= e1 | e2

<full-stmt> ::= <full-if> | s1 | s2<full-if> ::= if <expr> then <full-stmt> else <full-stmt>

Page 70: Programming Language Concepts

70

Eliminating The Ambiguity

Then we use the new non-terminal here.

The effect is that the new grammar can match an else partwith an if part only if all the nearer if parts are already matched.

<stmt> ::= <if-stmt> | s1 | s2<if-stmt> ::= if <expr> then <full-stmt> else <stmt> | if <expr> then <stmt><expr> ::= e1 | e2

Page 71: Programming Language Concepts

71

Correct Parse Tree

<if-stmt>

if <exp> then <stmt>

e1

s2

if <exp> then <full-stmt> else <stmt>

<if-stmt>

e2 s1

Page 72: Programming Language Concepts

72

Dangling Else

• The grammar trouble reflects a problem with the language, which we did not change

• A chain of if-then-else constructs can be very hard for people to read

• Especially true if some but not all of the else parts are present

Page 73: Programming Language Concepts

73

For Example

int a=0;if (0==0) if (0==1) a=1;else a=2;

What is the value of a after this fragment executes?

Answer: a=2, (read both ways)

Page 74: Programming Language Concepts

74

Clearer Styles

int a=0;if (0==0) if (0==1) a=1; else a=2;

int a=0;if (0==0) { if (0==1) a=1; else a=2;}

Better: correct indentation

Even better: use of a blockreinforces the structure

Page 75: Programming Language Concepts

75

Languages That Don’t Dangle

• Some languages define if-then-else in a way that forces the programmer to be more clear

• Algol does not allow the then part to be another if statement – though it can be a block containing an if statement

• Ada requires each if statement to be terminated with an end if

Page 76: Programming Language Concepts

76

Outline

• Operators• Precedence• Associativity• Ambiguities• Cluttered grammars• Parse trees and EBNF• Abstract syntax trees

Page 77: Programming Language Concepts

77

Clutter

• The new if-then-else grammar is harder for people to read than the old one

• It has a lot of clutter: more productions and more non-terminals

• Same with G4, G5 and G6: we eliminated the ambiguity but made the grammar harder for people to read

• This is not always the right trade-off

Page 78: Programming Language Concepts

78

Reminder: Multiple Audiences

• Grammars have multiple audiences:• Novices want to find out what legal programs

look like• Experts—advanced users and language system

implementers—want an exact, detailed definition

• Tools—parser and scanner generators—want an exact, detailed definition in a particular, machine-readable form

• Tools often need ambiguity eliminated, while people often prefer a more readable grammar

Page 79: Programming Language Concepts

79

Options

• Rewrite grammar to eliminate ambiguity

• Leave ambiguity but explain in accompanying text how things like associativity, precedence, and the dangling else should be parsed

• Do both in separate grammars

Page 80: Programming Language Concepts

80

Outline

• Operators• Precedence• Associativity• Ambiguities• Cluttered grammars• Parse trees and EBNF• Abstract syntax trees

Page 81: Programming Language Concepts

81

EBNF and Parse Trees

• You know that {x} means "zero or more repetitions of x" in EBNF

• So <exp> ::= <mulexp> {+ <mulexp>} should mean a <mulexp> followed by zero or more repetitions of "+ <mulexp>"

• But what then is the associativity of that + operator? What kind of parse tree would be generated for a+a+a?

Page 82: Programming Language Concepts

82

Two Camps

• Some people use EBNF loosely:• Use {} anywhere it helps

• Other people use EBNF strictly:• Use <exp> ::= <mulexp> {+ <mulexp>}

only for left-associative operators• Use recursive rules for right-associative

operators: <expa> ::= <expb> [ = <expa> ]

Page 83: Programming Language Concepts

83

About Syntax Diagrams

• Similar problem: what parse tree is generated?

• As in loose EBNF applications, add a paragraph of text dealing with ambiguities, associativity, precedence, and so on

Page 84: Programming Language Concepts

84

Outline

• Operators• Precedence• Associativity• Ambiguities• Cluttered grammars• Parse trees and EBNF• Abstract syntax trees

Page 85: Programming Language Concepts

85

Full-Size Grammars

• In any realistically large language, there are many non-terminals

• Especially true when in the cluttered but unambiguous form needed by parsing tools

• Extra non-terminals guide construction of unique parse tree

• Once parse tree is found, such non-terminals are no longer of interest

Page 86: Programming Language Concepts

86

Abstract Syntax Tree

• Language systems usually store an abbreviated version of the parse tree called the abstract syntax tree

• Details are implementation-dependent

• Usually, there is a node for every operation, with a subtree for every operand

Page 87: Programming Language Concepts

87

<exp>

<exp> + <mulexp>

b

<rootexp>

<exp> + <mulexp>

<mulexp> c

<rootexp>

a

<rootexp> +

c

a

+

b

parse tree

abstract syntax tree

Page 88: Programming Language Concepts

88

Parsing, Revisited

• When a language system parses a program, it goes through all the steps necessary to find the parse tree

• But it usually does not construct an explicit representation of the parse tree in memory

• Most systems construct an abstract syntax tree (AST) instead

Page 89: Programming Language Concepts

89

Conclusion

• Grammars define syntax, and more• They define not just a set of legal

programs, but a parse tree for each program

• The structure of a parse tree corresponds to the order in which different parts of the program are to be executed

• Therefore, grammars contribute to the definition of semantics

Page 90: Programming Language Concepts

90

End of Syntax and Semantics