introduction to programmingsharif.edu/~sani/courses/compiler/05-introduction-to...alexaiken...
TRANSCRIPT
Alex Aiken
Intro to Parsing
• Regular languages
– The weakest formal languages widely used
– Many applications
Alex Aiken
Intro to Parsing
What can regular languages express?
• Languages requiring counting modulo a fixed
integer
• Intuition: A finite automaton that runs long enough
must repeat states
• Finite automaton can’t remember # of times it has
visited a particular state
Alex Aiken
Intro to Parsing
• Input: sequence of tokens from lexer
• Output: parse tree of the program
(But some parsers never produce a parse tree …)
Intro to Parsing
• Cool
if x = y then 1 else 2fi
• Parser input
IF ID = ID THEN INT ELSE INT FI
• Parser output
=
ID ID
INT
IF-THEN-ELSE
INT
Alex Aiken
Alex Aiken
Intro to Parsing
Phase Input Output
Lexer String of
characters
String of tokens
Parser String of tokens Parse tree(may be implicit)
• Some compilers combine Lexer and Parser
Alex Aiken
CFGs
• Not all strings of tokens are programs . ..
• . . . parser must distinguish between valid andinvalid strings of tokens
• We need
– Alanguage for describing valid strings of tokens
– A method for distinguishing valid frominvalid strings of tokens
Alex Aiken
CFGs
• Programming language constructs have recursive
structure
• An EXPRis
if EXPR then EXPR else EXPRfi
while EXPR loop EXPR pool
…
• Context-free grammars are a natural notation for this recursive structure
Alex Aiken
CFGs
• A CFG consists of
– A set of terminals T
– A set of non-terminals N
– A start symbol S (a non-terminal)
– A set of productions
Alex Aiken
• In these lecture notes
– Non-terminals are written upper-case
– Terminals are written lower-case
– The start symbol is the left-hand side of the first
production
CFGs
Alex Aiken
1. Begin with a string with only the start symbol S
2. Replace any non-terminal X in the string by the
right-hand side of some production
X Y1…Yn
3. Repeat (2) until there are nonon-terminals in the
string
CFGs
Alex Aiken
Let G be a context-free grammar with start symbolS.
Then the language of G is:
CFGs
L(G) =
Let G be a context-free grammar with start symbolS.
Then Sentential Forms of Gare:
CFGs
SF(G) = S and (T N)*
Therefore:
L(G) SF(G)
Alex Aiken
• Terminals are so-called because there are no rules
for replacing them
• Once generated, terminals are permanent
• Terminals ought to be tokens of thelanguage
CFGs
Alex Aiken
CFGs
Some elements of the language:
id
if id then id else id fi
while id loop idpool
if while id loop id pool then id else id
if if id then id else id fi then id else id fi
CFGs
S aXa
X
| bY
Y
| cXc
abcba
acca
aba
abcbcba
Which of the strings are in
the language of the given
CFG?
CFGs
S aXa
X
| bY
Y
| cXc
abcba
acca
aba
abcbcba
Which of the strings are in
the language of the given
CFG?
Alex Aiken
CFGs
The idea of a CFG is a big step. But:
• Membership in a language is “yes” or “no”; - We also need a parse tree of the input
• Must handle errors gracefully
• Need an implementation of CFG’s (e.g.,Bison)
Alex Aiken
CFGs
• Form of the grammar isimportant
– Many grammars generate the same language
– Tools are sensitive to the grammar
AlexAiken
Derivations
A derivation is a sequence ofproductions
S … … … … …
A derivation can be drawn as a tree
– Start symbol is the tree’sroot
– For a production X Y1…Yn add childrenY1…Yn
to node X
AlexAiken
Derivations
• A parse treehas
– Terminals at the leaves
– Non-terminals at the interiornodes
• An in-order traversal of the leaves is the original input
• The parse tree shows the association of operations, the
input string doesnot
AlexAiken
• The example is a left-most
derivation
– At each step, replace the left-
most non-terminal
• There is an equivalent
notion of a right-most
derivation
Derivations
AlexAiken
Derivations
• Note that right-most and left-most derivations
havethe same parse tree
• The difference is the order in which branches
are added
Derivations
aXa
abYa
acXca
acca
Which of the following is a valid
derivation of the givengrammar?
S
S aXa
X | bY
Y | cXc |d
S
aXa
abYa
abcXca
abcbYca
abcbdca
S
aXa
abYa
abcXcda
abccda
S
aa
Derivations
aXa
abYa
acXca
acca
Which of the following is a valid
derivation of the givengrammar?
S
S aXa
X | bY
Y | cXc |d
S
aXa
abYa
abcXca
abcbYca
abcbdca
S
aXa
abYa
abcXcda
abccda
S
aa
Derivations
a
b
a
S
X
Y
c X c
Which of the following is a valid
parse tree for the givengrammar?
S
a X a
b Y
c X c d
a
Y
a
S
X
b
c X c d
a
b Y
a
S
X
S aXa
X | bY
Y | cXc |d
Derivations
a
b
a
S
X
Y
c X c
Which of the following is a valid
parse tree for the givengrammar?
S
a X a
b Y
c X c d
a
Y
a
S
X
b
c X c d
a
b Y
a
S
X
S aXa
X | bY
Y | cXc |d
AlexAiken
Derivations
• We are not just interested in whether s L(G)
– We need a parse tree for s
• A derivation defines a parsetree
– But one parse tree may have many derivations
• Left-most and right-most derivations areimportant in parser implementation
Alex Aiken
Ambiguity
• A grammar is ambiguous if it has more than
one parse tree for somestring
– Equivalently, there is more than one right-most
or left-most derivation for somestring
• Ambiguity is BAD
– Leaves meaning of some programs ill-defined
Ambiguity
Which of the following grammars areambiguous?
S SS| a | b
E E+ E| id
S Sa| Sb
E E’ | E’+E
E’ -E’ | id | (E)
Ambiguity
Which of the following grammars areambiguous?
S SS| a | b
E E+ E| id
S Sa| Sb
E E’ | E’+E
E’ -E’ | id | (E)
Alex Aiken
Ambiguity
• There are several ways to handleambiguity
• Most direct method is to rewrite grammar unambiguously
E E’ + E | E’
E’ id E’ | id | (E) E’| (E)
• Enforces precedence of * over+
Ambiguity
• The expression
if E1 then if E2 then E3 else E4
has two parse trees
Alex Aiken
if
E1 if
E2 E3 E4
if
E2 E3
E1 E4
if
Ambiguity
• The expression if E1 then if E2 then E3 elseE4
if
E1 if
E2 E3 E4
if
E1 if
E2 E3
E4
Alex Aiken
Ambiguity
• The expression if E1 then if E2 then E3 elseE4
if
E1 if
E2 E3 E4
if
E1 if
E2 E3
E4
Alex Aiken
Ambiguity
S Sa | Sb | S SS’
S’a | b
S Sa | SbS S | S’
S’a | b
Choose the unambiguous version
of the givenambiguous grammar: S SS| a | b |
Ambiguity
S Sa | Sb | S SS’
S’a | b
S Sa | SbS S | S’
S’a | b
Choose the unambiguous version
of the givenambiguous grammar: S SS| a | b |
Alex Aiken
Ambiguity
• Impossible to convert automatically anambiguous
grammar to an unambiguous one
• Used with care, ambiguity can simplify the grammar
– Sometimes allows more natural definitions
– We need disambiguation mechanisms
Alex Aiken
Ambiguity
• Instead of rewriting thegrammar
– Use the more natural (ambiguous) grammar
– Along with disambiguating declarations
• Most tools allow precedence andassociativity
declarations to disambiguate grammars
Ambiguity
E
+ E
intint
• Consider the grammar E E + E | int
• Ambiguous: two parse trees of int + int+ int
E E
E
E + E E
+
+
E int int E
intint
• Left associativity declaration:%left +
Alex Aiken
Ambiguity
E
+ E
intint
• Consider the grammar E E + E| int
• Ambiguous: two parse trees of int + int+ int
E E
E
E + E E
+
+
E int int E
intint
• Left associativity declaration:%left +
Alex Aiken
Ambiguity
E
* E
intint
• Consider the grammar E E + E | E* E | int
– And the string int + int * intE E
E
E * E E +
E int int E+
int int
• Precedence declarations: %left +
%left *
Alex Aiken