chapter 4. syntax analysis (1)

35
Chapter 4. Syntax Analysis (1)

Upload: sydney

Post on 07-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Chapter 4. Syntax Analysis (1). Application of a production A   in a derivation step  i   i+ 1. Formal grammars (1/3). Example : Let G 1 have N = { A , B , C }, T = { a , b , c } and the set of productions   ACB  BC A  aABCbB  bb - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 4. Syntax Analysis (1)

Chapter 4.

Syntax Analysis (1)

Page 2: Chapter 4. Syntax Analysis (1)

2

Application of a production A in a derivation step i i+1

A

i

i+1

Page 3: Chapter 4. Syntax Analysis (1)

3

Formal grammars (1/3)

Example : Let G1 have N = {A, B, C}, T = {a, b, c} and the set of productions

A CB BC

A aABC bB bb

A abC bC bc

cC cc

The reader should convince himself that the word akbkck is in L(G1) for all k 1 and that only these words are in L(G1). That is,

L(G1) = { akbkck | k 1}.

Page 4: Chapter 4. Syntax Analysis (1)

4

Formal grammars (2/3)

Example : Grammar G2 is a modification of G1:

G2: A CB BC

A aABC bB bb

A abC bC b

The reader may verify that L(G2) = { akbk | k 1}. Note that the last rule, bC b, erases all the C's from the derivation, and that only this production removes the nonterminal C from sentential forms.

Page 5: Chapter 4. Syntax Analysis (1)

5

Formal grammars (3/3)

Example : A simpler grammar that generates { akbk | k 1} is the grammar G3 :

G3: S

S aSb

S ab

A derivation of a3b3 is

S aSb aaSbb aaabbb

The reader may verify that L(G3) = { akbk | k 1}.

Page 6: Chapter 4. Syntax Analysis (1)

6

Type Format of Productions Remarks

0 φAψ→ φω ψ Unrestricted

Substitution

Rules

1 φAψ→ φω ψ, ω≠λ

∑→λ

Context

Sensitive

Context

Free

Right

Linear

Left

Linear

2 A →ω, ω≠λ

∑→λ

3 A→aB

A→a

∑→λ

A→Ba

A →a

∑→λ

Regular

Noncon-tracting

Contracting

The four types of formal grammars

Page 7: Chapter 4. Syntax Analysis (1)

7

Context-Sensitive Grammars(Type1)

Definition : A context-sensitive grammar G = (N,T,P,) is a formal grammar in which all productions are of the form

φAψ→φωψ, ω≠ The grammar may also contain the production →, if G is a context-sensitive (type1) grammar, then L(G) is a context-sensitive (type1) language.

Unrestricted Grammars(Type0)

Page 8: Chapter 4. Syntax Analysis (1)

8

Context-Free Grammars (Type2)

Definition : A context-free grammar G=(N,T,P,) is a formal grammar in which all productions are of the form

A→ω

The grammar may also contain the production →λ. If G is a context-free (type2) grammar, then L(G) is a context-free (type2) language.

A∈N {∪ }

ω∈(N∪T)*-{λ}

Page 9: Chapter 4. Syntax Analysis (1)

9

Regular Grammars (Type3) (1/2) Definition : A production of the form

A→aB or A→a

is called a right linear production. A production of the form

A→Ba or A→a

is a left linear production. A formal grammar is right linear if it contains only right linear productions, and is left linear if it contains only left linear p

roduction →λ. Left and right linear grammars are also known as regular grammars. If G is a regular (type3) grammar, then L(G) is a regular (type3) language.

A∈N {∑}∪B∈Na∈T

A∈N {∑}∪B∈Na∈T

Page 10: Chapter 4. Syntax Analysis (1)

10

Regular Grammars (Type3) (2/2)

Example: A left linear grammar G1 and a right linear grammar G2 have productions as follows:

G1 : G2 :

The reader may verify that

L(G1) = (10)*1=1(01)*=L(G2)

∑ → 1B

∑ → 1

A → 1B

B → 0A

A → 1

∑ → B1

∑ → 1

A → B1

B → A0

A → 1

Page 11: Chapter 4. Syntax Analysis (1)

11

Ambiguity (1/2)

Example : Consider the context-free grammar

G: S

S SS

S ab

We see that the derivations correspond to different tree diagrams. The grammar G is ambiguous with respect to the sentence ababab: if the tree diagrams were used as the basis for assigning meaning to the derived string, mistaken interpretation could result.

Page 12: Chapter 4. Syntax Analysis (1)

12

Ambiguity (2/2)

Definition: A context-free grammar is ambiguous if and only if it generates some sentence by two or more distinct leftmost derivations.

Page 13: Chapter 4. Syntax Analysis (1)

13

Fig. 4.1. Position of parser in compiler model.

sourceprogram

lexicalanalyzer

parser

symboltable

token

get nexttoken

parsetree

rest offront end

intermediaterepresentation

Page 14: Chapter 4. Syntax Analysis (1)

14

Syntax Error Handling (1/2)

Probable Errors– lexical, such as misspelling an identifier, keyword, or

operator

– syntactic, such as an arithmetic expression with unbalanced parentheses

– semantic, such as an operator applied to an incompatible operand

– logical, such as an infinitely recursive call

Page 15: Chapter 4. Syntax Analysis (1)

15

Syntax Error Handling (2/2)

The error handler in a parser has simple-to-state goals:– It should report the presence of errors clearly and

accurately.

– It should recover from each error quickly enough to be able to detect subsequent errors.

– It should not significantly slow down the processing of correct programs.

Page 16: Chapter 4. Syntax Analysis (1)

16

Error-Recovery Strategies

panic mode phrase level error productions global correction

Page 17: Chapter 4. Syntax Analysis (1)

17

Example 4.2

The grammar with the following productions defines simple arithmetic expressions.

exprexprexprexpr

opopopopop

expr op expr( expr )- exprid+-*/

Page 18: Chapter 4. Syntax Analysis (1)

18

Notational Conventions (1/2)

1. These symbols are terminals:i) Lower-case letters early in the alphabet such as a, b, c.

ii) Operator symbols such as +, -, etc.

iii) Punctuation symbols such as parentheses, comma, etc.

iv) The digits 0, 1, . . . , 9.

v) Boldface strings such as id or if.

2. These symbols are nonterminals:i) Upper-case letters early in the alphabet such as A, B, C.

ii) The letter S, which, when it appears, is usually the start symbol.

iii) Lower-case italic names such as expr or stmt.

3. Upper-case letters late in the alphabet, such as X, Y, Z, represent grammar symbols, that is, either nonterminals or terminals.

Page 19: Chapter 4. Syntax Analysis (1)

19

Notational Conventions (2/2)

4. Lower-case letters late in the alphabet, chiefly u, v, . . . , z, represent strings of terminals.

5. Lower-case Greek letters, , , , for example, represent strings of grammar symbols. Thus, a generic production could be written as A , indicating that there is a single nonterminal A on the left of the arrow (the left side of the production) and a string of grammar symbols to the right of the arrow (the right side of the production).

6. If A 1, A 2, . . . , A k are all productions with A on the left (we call them A-productions), we may write A 1| 2 | . . . | k . We call 1, 2, . . . , k the alternatives for A.

7. Unless otherwise stated, the left side of the first production is the start symbol.

Page 20: Chapter 4. Syntax Analysis (1)

20

Derivations

We say that A if A is a production and and are arbitrary strings of grammar symbols. If

1 2 . . . n, we say 1 derives n. The symbol means “derives in one step”. Often we wish to say “derives in zero or more steps”. For this purpose we can use the symbol . Thus,

1. for any string , and

2. If and , then .

*

*

* *

Page 21: Chapter 4. Syntax Analysis (1)

21

Fig. 4.3. Building the parse tree from derivation (4.4)

E

E- E

)(

E E

E-

E

E

)(

E

E-

E

EE +

id id

E

)(

E

E-

E

EE +

id

E

)(

E

E-

E

EE +

(Grammar 4.4 ) E -E -(E) -(E+E) -(id+E) -(id+id)

Page 22: Chapter 4. Syntax Analysis (1)

22

Eliminating Ambiguity

stmt

|

|

if expr then stmt

if expr then stmt else stmt

other

stmt

matched_stmt

unmatched_stmt

|

|

|

matched_stmt

unmatched_stmt

if expr then matched_stmt else matched_stmt

other

if expr then stmt

if expr then matched_stmt else unmatched_stmt

Page 23: Chapter 4. Syntax Analysis (1)

23

Elimination of Left Recursion

No matter how many A-productions there are, we can eliminate immediate left recursion from them by the following technique. First, we group the A-productions as

A A1 | A2 | . . . | Am | 1 | 2 | . . . | n

where no begins with an A. Then, we replace the A-productions by

A 1A' | 2A' | . . . | nA'

A' 1A' | 2A' | . . . | mA' |

Page 24: Chapter 4. Syntax Analysis (1)

24

Left Factoring

In general, if A 1 | 2 are two A-productions, and the i

nput begins with a nonempty string derived from , we do not know whether to expand A to 1 or to 2 . However, we

may defer the decision by expanding A to A'. Then, after seeing the input derived from , we expand A' to 1 or to 2 . T

hat is, left-factored, original productions become

A A' A' 1 | 2

Example 4.12.

The language L2 = { anbmcndm | n 1 and m 1 }

Page 25: Chapter 4. Syntax Analysis (1)

25

Fig. 4.9. Steps in top-down parse.

S

dc

ba

S

dc A A

S

dc

a

A

(a) (b) (c)

Page 26: Chapter 4. Syntax Analysis (1)

26

Fig. 4.10. Transition diagrams for grammar (4.11).

0 102E :T

1E'

3E' :+

4T

1065E'

7 109T :F

8T'

10T' : * 11F

101312T'

14F :(

15E

101716)

id

EE'T

T'F

TE'+TE' | FT'*FT' | (E) | id

(Grammar 4.11 )

Page 27: Chapter 4. Syntax Analysis (1)

27

Fig. 4.11. Simplified transition diagrams.

3E' :+

4T

5

106

3E' :+

4

T

106

3E :+

4

T

106

0T

3E :

+

106

0T

(a) (b)

(c) (d)

Page 28: Chapter 4. Syntax Analysis (1)

28

Fig. 4.12. Simplified transition diagrams for arithmetic expressions.

*

7 1013T :F

8

14F :(

15E

101716)

id

+

0 106E :T

3

Page 29: Chapter 4. Syntax Analysis (1)

29

Fig. 4.13. Model of a nonrecursive predictive parser.

a + b $

Predictive ParsingProgram

XYZ$

Parsing TableM

INPUT

STACK OUTPUT

Page 30: Chapter 4. Syntax Analysis (1)

30

Nonrecursive Predictive Parsing

1. If X = a = $, the parser halts and announces successful completion of parsing.

2. If X = a $, the parser pops X off the stack and advances the input pointer to the next input symbol.

3. If X is a nonterminal, the program consults entry M[X, a] of the parsing table M. This entry will be either an X-production of the grammar or an error entry. If, for example, M[X, a] = {X UVW}, the parser replaces X on top of the stack by WVU (with U on top). As output, we shall assume that the parser just prints the production used; any other code could be executed here. If M[X, a] = error, the parser calls an error recovery routine.

Page 31: Chapter 4. Syntax Analysis (1)

31

Fig. 4.15. Parsing table M for grammar (4.11).

NONTER-MINAL

INPUT SYMBOL

Id + * ( ) $

E

E'

T

T'

F

E TE'

T FT'

F id

E' +TE'

T' T' *FT'

E TE'

T FT'

F (E)

E'

T'

E'

T'

Page 32: Chapter 4. Syntax Analysis (1)

32Fig. 4.16. Moves made by predictive parser on input id + id * id.

STACK INPUT OUTPUT

$E$E' T$E' T' F$E' T' id$E' T'$E' $E' T +$E' T$E' T' F$E' T' id$E' T' $E' T' F *$E' T' F$E' T' id$E' T'$E' $

id + id * id$id + id * id$id + id * id$id + id * id$

+ id * id$+ id * id$+ id * id$

id * id$id * id$id * id$

* id$* id$

id$id$

$$$

E T E'T F T'F id

T' E' + T E'

T F T'F id

T' * F T'

F id

T' E'

Page 33: Chapter 4. Syntax Analysis (1)

33

Fig. 4.17. Parsing table M for grammar (4.13).

NONTER-MINAL

INPUT SYMBOL

a b e i t $

S S aS iEtS

S'

S'S'

S' eSS'

E E b

SE

iEtS | iEtSeS | ab

(Grammar 4.13 )

Page 34: Chapter 4. Syntax Analysis (1)

34

Fig. 4.18. Synchronizing tokens added to parsing table of Fig. 4.15.

NONTER-MINAL

INPUT SYMBOL

id + * ( ) $

E

E'

T

T'

F

E TE'

T FT'

F id

E' +TE'

synch

T' synch

T' *FT'

synch

E TE'

T FT'

F (E)

synch

E' synch

T' synch

synch

E' synch

T' synch

Page 35: Chapter 4. Syntax Analysis (1)

35Fig. 4.19. Parsing and error recovery moves made by predictive parser.

STACK INPUT OUTPUT

$E$E$E' T$E' T' F$E' T' id$E' T'$E' T' F *$E' T' F$E' T' $E' $E' T +$E' T$E' T' F$E' T' id$E' T'$E' $

) id * + id$id * + id$id * + id$id * + id$id * + id$

* + id$* + id$

+ id$+ id$+ id$+ id$

id$id$id$

$$$

error, skip )id is in FIRST(E)

error, M[F, +] = synchF has been popped