Download - Chapter 4. Syntax Analysis (1)
![Page 1: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/1.jpg)
Chapter 4.
Syntax Analysis (1)
![Page 2: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/2.jpg)
2
Application of a production A in a derivation step i i+1
A
i
i+1
![Page 3: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/3.jpg)
3
Formal grammars (1/3)
Example : Let G1 have N = {A, B, C}, T = {a, b, c} and the set of productions
A CB BC
A aABC bB bb
A abC bC bc
cC cc
The reader should convince himself that the word akbkck is in L(G1) for all k 1 and that only these words are in L(G1). That is,
L(G1) = { akbkck | k 1}.
![Page 4: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/4.jpg)
4
Formal grammars (2/3)
Example : Grammar G2 is a modification of G1:
G2: A CB BC
A aABC bB bb
A abC bC b
The reader may verify that L(G2) = { akbk | k 1}. Note that the last rule, bC b, erases all the C's from the derivation, and that only this production removes the nonterminal C from sentential forms.
![Page 5: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/5.jpg)
5
Formal grammars (3/3)
Example : A simpler grammar that generates { akbk | k 1} is the grammar G3 :
G3: S
S aSb
S ab
A derivation of a3b3 is
S aSb aaSbb aaabbb
The reader may verify that L(G3) = { akbk | k 1}.
![Page 6: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/6.jpg)
6
Type Format of Productions Remarks
0 φAψ→ φω ψ Unrestricted
Substitution
Rules
1 φAψ→ φω ψ, ω≠λ
∑→λ
Context
Sensitive
Context
Free
Right
Linear
Left
Linear
2 A →ω, ω≠λ
∑→λ
3 A→aB
A→a
∑→λ
A→Ba
A →a
∑→λ
Regular
Noncon-tracting
Contracting
The four types of formal grammars
![Page 7: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/7.jpg)
7
Context-Sensitive Grammars(Type1)
Definition : A context-sensitive grammar G = (N,T,P,) is a formal grammar in which all productions are of the form
φAψ→φωψ, ω≠ The grammar may also contain the production →, if G is a context-sensitive (type1) grammar, then L(G) is a context-sensitive (type1) language.
Unrestricted Grammars(Type0)
![Page 8: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/8.jpg)
8
Context-Free Grammars (Type2)
Definition : A context-free grammar G=(N,T,P,) is a formal grammar in which all productions are of the form
A→ω
The grammar may also contain the production →λ. If G is a context-free (type2) grammar, then L(G) is a context-free (type2) language.
A∈N {∪ }
ω∈(N∪T)*-{λ}
![Page 9: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/9.jpg)
9
Regular Grammars (Type3) (1/2) Definition : A production of the form
A→aB or A→a
is called a right linear production. A production of the form
A→Ba or A→a
is a left linear production. A formal grammar is right linear if it contains only right linear productions, and is left linear if it contains only left linear p
roduction →λ. Left and right linear grammars are also known as regular grammars. If G is a regular (type3) grammar, then L(G) is a regular (type3) language.
A∈N {∑}∪B∈Na∈T
A∈N {∑}∪B∈Na∈T
![Page 10: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/10.jpg)
10
Regular Grammars (Type3) (2/2)
Example: A left linear grammar G1 and a right linear grammar G2 have productions as follows:
G1 : G2 :
The reader may verify that
L(G1) = (10)*1=1(01)*=L(G2)
∑ → 1B
∑ → 1
A → 1B
B → 0A
A → 1
∑ → B1
∑ → 1
A → B1
B → A0
A → 1
![Page 11: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/11.jpg)
11
Ambiguity (1/2)
Example : Consider the context-free grammar
G: S
S SS
S ab
We see that the derivations correspond to different tree diagrams. The grammar G is ambiguous with respect to the sentence ababab: if the tree diagrams were used as the basis for assigning meaning to the derived string, mistaken interpretation could result.
![Page 12: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/12.jpg)
12
Ambiguity (2/2)
Definition: A context-free grammar is ambiguous if and only if it generates some sentence by two or more distinct leftmost derivations.
![Page 13: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/13.jpg)
13
Fig. 4.1. Position of parser in compiler model.
sourceprogram
lexicalanalyzer
parser
symboltable
token
get nexttoken
parsetree
rest offront end
intermediaterepresentation
![Page 14: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/14.jpg)
14
Syntax Error Handling (1/2)
Probable Errors– lexical, such as misspelling an identifier, keyword, or
operator
– syntactic, such as an arithmetic expression with unbalanced parentheses
– semantic, such as an operator applied to an incompatible operand
– logical, such as an infinitely recursive call
![Page 15: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/15.jpg)
15
Syntax Error Handling (2/2)
The error handler in a parser has simple-to-state goals:– It should report the presence of errors clearly and
accurately.
– It should recover from each error quickly enough to be able to detect subsequent errors.
– It should not significantly slow down the processing of correct programs.
![Page 16: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/16.jpg)
16
Error-Recovery Strategies
panic mode phrase level error productions global correction
![Page 17: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/17.jpg)
17
Example 4.2
The grammar with the following productions defines simple arithmetic expressions.
exprexprexprexpr
opopopopop
expr op expr( expr )- exprid+-*/
![Page 18: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/18.jpg)
18
Notational Conventions (1/2)
1. These symbols are terminals:i) Lower-case letters early in the alphabet such as a, b, c.
ii) Operator symbols such as +, -, etc.
iii) Punctuation symbols such as parentheses, comma, etc.
iv) The digits 0, 1, . . . , 9.
v) Boldface strings such as id or if.
2. These symbols are nonterminals:i) Upper-case letters early in the alphabet such as A, B, C.
ii) The letter S, which, when it appears, is usually the start symbol.
iii) Lower-case italic names such as expr or stmt.
3. Upper-case letters late in the alphabet, such as X, Y, Z, represent grammar symbols, that is, either nonterminals or terminals.
![Page 19: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/19.jpg)
19
Notational Conventions (2/2)
4. Lower-case letters late in the alphabet, chiefly u, v, . . . , z, represent strings of terminals.
5. Lower-case Greek letters, , , , for example, represent strings of grammar symbols. Thus, a generic production could be written as A , indicating that there is a single nonterminal A on the left of the arrow (the left side of the production) and a string of grammar symbols to the right of the arrow (the right side of the production).
6. If A 1, A 2, . . . , A k are all productions with A on the left (we call them A-productions), we may write A 1| 2 | . . . | k . We call 1, 2, . . . , k the alternatives for A.
7. Unless otherwise stated, the left side of the first production is the start symbol.
![Page 20: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/20.jpg)
20
Derivations
We say that A if A is a production and and are arbitrary strings of grammar symbols. If
1 2 . . . n, we say 1 derives n. The symbol means “derives in one step”. Often we wish to say “derives in zero or more steps”. For this purpose we can use the symbol . Thus,
1. for any string , and
2. If and , then .
*
*
* *
![Page 21: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/21.jpg)
21
Fig. 4.3. Building the parse tree from derivation (4.4)
E
E- E
)(
E E
E-
E
E
)(
E
E-
E
EE +
id id
E
)(
E
E-
E
EE +
id
E
)(
E
E-
E
EE +
(Grammar 4.4 ) E -E -(E) -(E+E) -(id+E) -(id+id)
![Page 22: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/22.jpg)
22
Eliminating Ambiguity
stmt
|
|
if expr then stmt
if expr then stmt else stmt
other
stmt
matched_stmt
unmatched_stmt
|
|
|
matched_stmt
unmatched_stmt
if expr then matched_stmt else matched_stmt
other
if expr then stmt
if expr then matched_stmt else unmatched_stmt
![Page 23: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/23.jpg)
23
Elimination of Left Recursion
No matter how many A-productions there are, we can eliminate immediate left recursion from them by the following technique. First, we group the A-productions as
A A1 | A2 | . . . | Am | 1 | 2 | . . . | n
where no begins with an A. Then, we replace the A-productions by
A 1A' | 2A' | . . . | nA'
A' 1A' | 2A' | . . . | mA' |
![Page 24: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/24.jpg)
24
Left Factoring
In general, if A 1 | 2 are two A-productions, and the i
nput begins with a nonempty string derived from , we do not know whether to expand A to 1 or to 2 . However, we
may defer the decision by expanding A to A'. Then, after seeing the input derived from , we expand A' to 1 or to 2 . T
hat is, left-factored, original productions become
A A' A' 1 | 2
Example 4.12.
The language L2 = { anbmcndm | n 1 and m 1 }
![Page 25: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/25.jpg)
25
Fig. 4.9. Steps in top-down parse.
S
dc
ba
S
dc A A
S
dc
a
A
(a) (b) (c)
![Page 26: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/26.jpg)
26
Fig. 4.10. Transition diagrams for grammar (4.11).
0 102E :T
1E'
3E' :+
4T
1065E'
7 109T :F
8T'
10T' : * 11F
101312T'
14F :(
15E
101716)
id
EE'T
T'F
TE'+TE' | FT'*FT' | (E) | id
(Grammar 4.11 )
![Page 27: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/27.jpg)
27
Fig. 4.11. Simplified transition diagrams.
3E' :+
4T
5
106
3E' :+
4
T
106
3E :+
4
T
106
0T
3E :
+
106
0T
(a) (b)
(c) (d)
![Page 28: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/28.jpg)
28
Fig. 4.12. Simplified transition diagrams for arithmetic expressions.
*
7 1013T :F
8
14F :(
15E
101716)
id
+
0 106E :T
3
![Page 29: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/29.jpg)
29
Fig. 4.13. Model of a nonrecursive predictive parser.
a + b $
Predictive ParsingProgram
XYZ$
Parsing TableM
INPUT
STACK OUTPUT
![Page 30: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/30.jpg)
30
Nonrecursive Predictive Parsing
1. If X = a = $, the parser halts and announces successful completion of parsing.
2. If X = a $, the parser pops X off the stack and advances the input pointer to the next input symbol.
3. If X is a nonterminal, the program consults entry M[X, a] of the parsing table M. This entry will be either an X-production of the grammar or an error entry. If, for example, M[X, a] = {X UVW}, the parser replaces X on top of the stack by WVU (with U on top). As output, we shall assume that the parser just prints the production used; any other code could be executed here. If M[X, a] = error, the parser calls an error recovery routine.
![Page 31: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/31.jpg)
31
Fig. 4.15. Parsing table M for grammar (4.11).
NONTER-MINAL
INPUT SYMBOL
Id + * ( ) $
E
E'
T
T'
F
E TE'
T FT'
F id
E' +TE'
T' T' *FT'
E TE'
T FT'
F (E)
E'
T'
E'
T'
![Page 32: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/32.jpg)
32Fig. 4.16. Moves made by predictive parser on input id + id * id.
STACK INPUT OUTPUT
$E$E' T$E' T' F$E' T' id$E' T'$E' $E' T +$E' T$E' T' F$E' T' id$E' T' $E' T' F *$E' T' F$E' T' id$E' T'$E' $
id + id * id$id + id * id$id + id * id$id + id * id$
+ id * id$+ id * id$+ id * id$
id * id$id * id$id * id$
* id$* id$
id$id$
$$$
E T E'T F T'F id
T' E' + T E'
T F T'F id
T' * F T'
F id
T' E'
![Page 33: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/33.jpg)
33
Fig. 4.17. Parsing table M for grammar (4.13).
NONTER-MINAL
INPUT SYMBOL
a b e i t $
S S aS iEtS
S'
S'S'
S' eSS'
E E b
SE
iEtS | iEtSeS | ab
(Grammar 4.13 )
![Page 34: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/34.jpg)
34
Fig. 4.18. Synchronizing tokens added to parsing table of Fig. 4.15.
NONTER-MINAL
INPUT SYMBOL
id + * ( ) $
E
E'
T
T'
F
E TE'
T FT'
F id
E' +TE'
synch
T' synch
T' *FT'
synch
E TE'
T FT'
F (E)
synch
E' synch
T' synch
synch
E' synch
T' synch
![Page 35: Chapter 4. Syntax Analysis (1)](https://reader035.vdocuments.mx/reader035/viewer/2022081504/56813df3550346895da7ce7c/html5/thumbnails/35.jpg)
35Fig. 4.19. Parsing and error recovery moves made by predictive parser.
STACK INPUT OUTPUT
$E$E$E' T$E' T' F$E' T' id$E' T'$E' T' F *$E' T' F$E' T' $E' $E' T +$E' T$E' T' F$E' T' id$E' T'$E' $
) id * + id$id * + id$id * + id$id * + id$id * + id$
* + id$* + id$
+ id$+ id$+ id$+ id$
id$id$id$
$$$
error, skip )id is in FIRST(E)
error, M[F, +] = synchF has been popped