edan65: compilers, lecture03 context-freegrammars...
TRANSCRIPT
![Page 1: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/1.jpg)
EDAN65:Compilers,Lecture 03
Context-free grammars,introductionto parsingGörelHedinRevised:2018-09-10
![Page 2: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/2.jpg)
Courseoverview
Semantic analyzer
Intermediatecode generator
Optimizer
Targetcodegenerator
2
Lexical analyzer(scanner)
Syntactic analyzer(parser)
Regularexpressions
Context-freegrammar
Attributegrammar
machine
runtime system
stack
heap
codeanddata
objects
activationrecords
Interpreter
target code
tokens
Attributed AST
intermediate code
sourcecode (text)
AST(Abstractsyntaxtree)
intermediate code
garbagecollection
Virtualmachine
This lecture
![Page 3: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/3.jpg)
Analyzing programtext
3
sum = sum + k ;
AssignStmt
Exp
Add
Exp Exp
ID EQ ID PLUS ID SEMIprogram text
tokens
parse tree
![Page 4: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/4.jpg)
Analyzing programtext
4
AssignStmt
Exp
Add
Exp Exp
ID EQ ID PLUS ID SEMIprogram text
tokens
parse tree
sum = sum + k ; \n
non-tokens(likewhitespace)arediscarded
![Page 5: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/5.jpg)
Recall:Generatingthecompiler:
Semantic analyzer
Lexical analyzer(scanner)
Syntactic analyzer(parser)
Regularexpressions
ScannergeneratorJFlex
Context-freegrammar
ParsergeneratorBeaver
Attributegrammar
Attribute evaluatorgenerator
We will use aparsergeneratorcalled Beaver
5
tokens
text
tree
![Page 6: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/6.jpg)
6
Context-Free Grammars
![Page 7: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/7.jpg)
Regular ExpressionsvsContext-Free Grammars
7
AnREcan have iteration
ACFGcan also have recursion(itispossible toderive asymbol,e.g.,Stmt,fromitself)
Example REs:WHILE = "while"ID = [a-z][a-z0-9]*LPAR = "("RPAR = ")"PLUS = "+"...
Example CFG:Stmt –> WhileStmtStmt –> AssignStmtWhileStmt –> WHILE LPAR Exp RPAR StmtExp –> IDExp –> Exp PLUS Exp...
![Page 8: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/8.jpg)
ElementsofaContext-Free Grammar
8
Production rules:X –> s1 s2 … sn
where sk isasymbol(terminalornonterminal),n>=0
Nonterminal symbols
Terminalsymbols(tokens)
Startsymbol(one ofthenonterminals,usually theleft-handside of thefirst production)
Example CFG:Stmt –> WhileStmtStmt –> AssignStmtWhileStmt –> WHILE LPAR Exp RPAR StmtAssignStmt –> ID EQ Exp SEMIC…
![Page 9: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/9.jpg)
Shorthand foralternatives
9
Stmt –> WhileStmtStmt –> AssignStmt
Stmt –> WhileStmt | AssignStmt
isequivalent to
![Page 10: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/10.jpg)
Shorthand forrepetition
10
Stmt*
StmtList –> e | Stmt StmtList
isequivalent to
StmtList
where
![Page 11: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/11.jpg)
ExerciseConstruct agrammar covering this programandsimilar ones:
11
Example program:while (k <= n) {sum = sum + k; k = k+1;}
CFG:Stmt –> WhileStmt | AssignStmt | CompoundStmtWhileStmt –> "while" "(" Exp ")" StmtAssignStmt –> ID "=" Exp ";"CompoundStmt –> ...Exp –> ...LessEq –> ...Add –> ...
(Often,simpletokensarewritten directly astextstrings)
![Page 12: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/12.jpg)
SolutionConstruct agrammar covering this programandsimilar ones:
12
CFG:Stmt –> WhileStmt | AssignStmt | CompoundStmtWhileStmt –> "while" "(" Exp ")" StmtAssignStmt –> ID "=" Exp ";"CompoundStmt –> "{" Stmt* "}"Exp –> LessEq | Add | ID | INTLessEq –> Exp "<=" ExpAdd –> Exp "+" Exp
Example program:while (k <= n) {sum = sum + k; k = k+1;}
![Page 13: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/13.jpg)
ParsingUse thegrammar toderive atree foraprogram:
13sum = sum + k ;
StmtExample program:sum = sum + k;
Startsymbol
![Page 14: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/14.jpg)
Parse treeUse thegrammar toderive aparse tree foraprogram:
14sum = sum + k ;
Stmt
AssignStmt
Exp
Add
Exp Exp
Example program:sum = sum + k;
Nonterminalsareinnernodes
Startsymbol
Terminalsareleafs
Aparse tree includes alltheinputtokensasleafs.
![Page 15: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/15.jpg)
Corresponding abstractsyntaxtree(will bediscussed inlaterlecture)
15sum = sum + k ;
AssignStmt
Add
IdExp IdExp
Example program:sum = sum + k;
IdExp
Anabstractsyntaxtree issimilarto aparse tree,but simpler.
Itincludesonlysomeofthetokens.
(Tokensthatcouldbegeneratedfromthetreeareexcluded.)
![Page 16: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/16.jpg)
EBNFvsCanonical Form
16
EBNF:Stmt –> AssignStmt | CompoundStmtAssignStmt –> ID "=" Exp ";"CompoundStmt –> "{" Stmt* "}"Exp –> Add | IDAdd –> Exp "+" Exp
Canonical form:Stmt –> ID "=" Exp ";"Stmt –> "{" Stmts "}"Stmts –> eStmts –> Stmt StmtsExp –> Exp "+" ExpExp –> ID
(Extended)Backus-Naur Form:• Compact,easytoreadandwrite• EBNFhasalternatives,repetition,optionals,parentheses (likeREs)
• Commonnotationforpracticaluse
Canonical form:• Core formalismforCFGs• Useful forproving properties andexplaining algorithms
![Page 17: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/17.jpg)
Realworldexample:TheJavaLanguageSpecification
17
See http://docs.oracle.com/javase/specs/jls/se8/html/index.html• See Chapter 2about theJavagrammar notation.• Lookatsome other chapters to see other syntaxexamples.
CompilationUnit:[PackageDeclaration]{ImportDeclaration}{TypeDeclaration}
PackageDeclaration:{PackageModifier}package Identifier {.Identifier};
PackageModifier:Annotation
…
![Page 18: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/18.jpg)
18
Formaldefinitionof CFGs
![Page 19: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/19.jpg)
FormaldefinitionofCFGs (canonical form)
19
Acontext-free grammar G=(N,T,P,S),whereN– thesetofnonterminalsymbolsT– thesetofterminalsymbolsP– thesetofproduction rules,each withtheform
X–>Y1 Y2 …Ynwhere X∈ N,n≥ 0,andYk∈ N∪ T
S– thestartsymbol(one ofthenonterminals).I.e.,S∈ N
So,theleft-hand side Xofarule isanonterminal.
Andtheright-hand side Y1 Y2 …Yn isasequence of nonterminalsandterminals.
If therhs foraproduction isempty,i.e.,n=0,we writeX–>e
![Page 20: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/20.jpg)
AgrammarG defines alanguageL(G)
20
A context-free grammar G=(N,T,P,S),whereN– thesetofnonterminalsymbolsT– thesetofterminalsymbolsP– thesetofproduction rules,each withtheform
X–>Y1 Y2 …Ynwhere X∈ N,n≥ 0,andYk∈ N∪ T
S– thestartsymbol(one ofthenonterminals).I.e.,S∈ N
Gdefines alanguage L(G) overthealphabet T
T*isthesetofallpossible sequences ofTsymbols.
L(G)isthesubsetofT*thatcan bederived fromthestartsymbolS,byfollowing theproduction rules P.
![Page 21: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/21.jpg)
Exercise
21
G = (N, T, P, S)
P = {Stmt –> ID "=" Exp ";",Stmt –> "{" Stmts "}" ,Stmts –> e ,Stmts –> Stmt Stmts ,Exp –> Exp "+" Exp ,Exp –> ID
}
N = { }
T = { }
S =
L(G) = {
}
![Page 22: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/22.jpg)
Solution
22
G = (N, T, P, S)
P = {Stmt –> ID "=" Exp ";",Stmt –> "{" Stmts "}" ,Stmts –> e ,Stmts –> Stmt Stmts ,Exp –> Exp "+" Exp ,Exp –> ID
}
N = {Stmt, Exp, Stmts}
T = {ID, "=", "{", "}", ";", "+"}
S = Stmt
L(G) = {"{" "}","{" "{" "}" "}",ID "=" ID ";","{" ID "=" ID ";" "}",ID "=" ID "+" ID ";","{" "{" "}" "{" "}" "}","{" "{" "{" "}" "}" "}","{" ID "=" ID "+" ID ";" "}",ID "=" ID "+" ID "+" ID ";",...
}
Thesequences inL(G)areusually called sentences orstrings
![Page 23: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/23.jpg)
23
Derivations
![Page 24: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/24.jpg)
Derivationstep
24
If we have asequence ofterminalsandnonterminals,e.g.,
XaYY b
we can replace one ofthenonterminals,applying aproductionrule.Thisiscalled aderivationstep.(Swedish:Härledningssteg)
Supposethere isaproduction
Y–>Xa
andwe apply itforthefirstYinthesequence.We write thederivationstepasfollows:
XaYY b=>XaXaYb
![Page 25: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/25.jpg)
Derivation
25
Aderivation,issimply asequence ofderivationsteps,e.g.:
g0 =>g1 =>…=>gn (n≥0)
where each gi isasequence ofterminalsandnonterminals
Ifthere isaderivationfromg0 togn,we can write thisas
g0 =>*gn
Sothismeans itispossible togetfromthesequence g0 tothesequence gn byfollowing theproduction rules.
![Page 26: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/26.jpg)
Definitionofthelanguage L(G)
26
Recall that:
G=(N,T,P,S)
T* isthesetofallpossible sequences ofT symbols.
L(G)isthesubsetofT*thatcan bederived fromthestartsymbol S,byfollowing theproduction rules P.
Using theconcept ofderivations,we can formally define L(G) asfollows:
L(G)={w∈ T*| S=>*w}
![Page 27: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/27.jpg)
Exercise:Prove thatasentence belongs toalanguage
27
Prove that
INT+INT*INT
Proof:(byshowing allthederivationstepsfromthestartsymbolExp):
Exp=>
belongs tothelanguage ofthefollowing grammar:
p1: Exp –>Exp "+" Expp2: Exp –>Exp "*" Expp3: Exp –> INT
![Page 28: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/28.jpg)
Solution:Prove thatasentence belongs toalanguage
28
Prove that
INT+INT*INT
Proof:(byshowing allthederivationstepsfromthestartsymbolExp)
Exp=>p1 Exp "+" Exp=>p3 INT"+"Exp=>p2 INT"+"Exp "*"Exp=>p3 INT"+"INT"*"Exp=>p3 INT"+"INT"*"INT
belongs tothelanguage ofthefollowing grammar:
p1: Exp –>Exp "+" Expp2: Exp –>Exp "*" Expp3: Exp –> INT
![Page 29: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/29.jpg)
Leftmost andrightmost derivations
29
Inaleftmost derivation,theleftmost nonterminalisreplacedineach derivationstep,e.g.,:
Exp =>Exp "+" Exp =>INT"+"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT
LLparsingalgorithms use leftmost derivation.LRparsingalgorithms use rightmost derivation.Willbediscussed inlaterlectures.
Inarightmost derivation,therightmost nonterminalisreplaced ineach derivationstep,e.g.,:
Exp =>Exp "+" Exp =>Exp "+"Exp "*"Exp =>Exp "+"Exp "*"INT=>Exp "+"INT"*"INT=>INT"+"INT"*"INT
![Page 30: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/30.jpg)
Aderivationcorresponds tobuilding aparse tree
30
Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT
Example derivation:
Exp =>Exp "+" Exp =>INT"+"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT
Exercise:build theparse tree(also called derivationtree).
![Page 31: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/31.jpg)
Aderivationcorresponds tobuilding aparse tree
31
Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT
Example derivation:
Exp =>Exp "+" Exp =>INT"+"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT
Parse tree (derivationtree):
Exp
Exp Exp
Exp Exp
"+"
INT"*"
INT INT
![Page 32: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/32.jpg)
32
Ambiguities
![Page 33: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/33.jpg)
Exercise:Canwe do another derivationofthesamesentence,
thatgivesadifferentparse tree?
33
Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT
Parse tree:
Anotherderivation:
Exp =>
![Page 34: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/34.jpg)
Solution:Canwe do another derivationofthesamesentence,
thatgivesadifferentparse tree?
34
Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT
Parse tree:
Exp
Exp"*"
INT
Exp
Exp Exp"+"
INT INT
Anotherderivation:
Exp =>Exp "*" Exp =>Exp "+"Exp "*"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT
Which parse tree would we prefer?
![Page 35: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/35.jpg)
Ambiguous context-freegrammars
35
ACFGisambiguous if asentence inthelanguage can bederived bytwo (ormore)differentparse trees.
ACFGisunambiguous if each sentence inthelanguage canbederived byonly one parse tree.
(Swedish:tvetydig,otvetydig)
Note!There can bemany differentderivationsthat give thesameparse tree.
![Page 36: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/36.jpg)
HowcanweknowifaCFGisambiguous?
36
Ifwe find anexample of anambiguity,we know thegrammar isambiguous.
There are algorithms fordeciding if aCFGbelongs to certainsubsets of CFGs,e.g.LL,LR,etc.(See laterlectures.)Thesegrammarsare unambiguous.
But inthegeneralcase,theproblemisundecidable:itisnotpossible to construct ageneralalgorithm that decidesambiguity foranarbitrary CFG.
Strategies foreliminating ambiguities,next lecture.
![Page 37: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/37.jpg)
37
Parsing
![Page 38: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/38.jpg)
Differentparsingalgorithms
38
Ambiguous
Unambiguous
Allcontext-freegrammars
LR
LL
LL:Left-to-rightscanLeftmost derivationBuilds tree top-downSimpleto understand
LR:Left-to-rightscanRightmost derivationBuilds tree bottom-upMore powerful
![Page 39: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/39.jpg)
LLandLRparsers:main idea
39
...if IDthen ID=ID;ID...
LR(1):decidestobuildAssignafterseeingthefirsttokenfollowingitssubtree.Thetreeisbuiltbottomup.
Id Assign
Id Id
Thetokeniscalled lookahead.LL(k)andLR(k)use k lookahead tokens.
Inpractice,k=1isusuallyused
...if IDthen ID=ID;ID...
IfStmt
Id Assign
LL(1):decidestobuildAssignafterseeingthefirsttokenofitssubtree.Thetreeisbuilttopdown.
CompoundStmt
![Page 40: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/40.jpg)
Recursive-descentparsingAwayofprogramminganLL(1)parserbyrecursivemethodcalls
40
Assume aBNFgrammar with exactly one production rule foreach nonterminal.(Can easily begeneralized to EBNF.)
Each production rule RHSiseither1. asequence of token/nonterminal symbols,or2. asetof nonterminal symbolalternatives
Foreach nonterminal,amethod isconstructed.Themethod1. matches tokensandcallsnonterminal methods,or2. callsone of thenonterminal methods – which one depends onthe
lookahead token.
Ifthelookahead tokendoes notmatch,aparsingerror isreported.
A–>B|C|DB–>eCfDC–>...D–>...
![Page 41: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/41.jpg)
ExampleJavaimplementation:overview
41
statement –>assignment |compoundStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACE...
class Parser{privateint token; //current lookahead tokenvoid accept(int t){...} //accepttandreadinnext tokenvoid error(Stringstr){...} //generate error messagevoid statement(){...}void assignment (){...}void compoundStmt (){...}...
}
![Page 42: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/42.jpg)
Example:recursivedescentmethods
42
statement –>assignment |compoundStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACE
class Parser{void statement(){switch(token){case ID:assignment();break;case LBRACE:compoundStmt();break;default:error("Expecting statement,found:"+token);}}void assignment(){accept(ID);accept(ASSIGN);expr();accept(SEMICOLON);}void compoundStmt(){accept(LBRACE);while (token!=RBRACE){statement();}accept(RBRACE);}...}
![Page 43: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/43.jpg)
Example:Parserskeletondetails
43
statement –>assignment |compoundStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACEexpr –>...
class Parser{finalstatic int ID=1,WHILE=2,DO=3,ASSIGN=4,...;privateint token; //current lookahead tokenvoid accept(int t){ //accepttandreadinnext tokenif (token==t){token=nextToken();}else {error("Expected "+t+",but found "+token);}}void error(Stringstr){...} //generate error messageprivateint nextToken(){...}//readnext tokenfromscannervoid statement()......
}
![Page 44: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/44.jpg)
ArethesegrammarsLL(1)?
44
expr –>name params |name
Whatwouldhappeninarecursive-descentparser?
Could they beLL(2)?LL(k)?
Commonprefix
expr –>expr "+" term Left recursion
![Page 45: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/45.jpg)
Dealingwithcommonprefixoflimitedlength:Locallookahead
45
LL(2)grammar:statement –>assignment |compoundStmt |callStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACEcallStmt –> IDLPAR expr RPAR SEMICOLON
voidstatement()...
![Page 46: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/46.jpg)
46
LL(2)grammar:statement –>assignment |compoundStmt |callStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACEcallStmt –> IDLPAR expr RPAR SEMICOLON
voidstatement(){switch(token){caseID:if(lookahead(2) ==ASSIGN){assignment();}else{callStmt();}break;caseLBRACE:compoundStmt();break;default:error("Expectingstatement,found:"+token);}}
Dealingwithcommonprefixoflimitedlength:Locallookahead
![Page 47: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/47.jpg)
Generatingtheparser:
Syntactic analyzer(parser)
Context-freegrammar Parsergenerator
47
tokens
tree
![Page 48: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/48.jpg)
Beaver:anLR-based parsergenerator
ParserinJava
Context-freegrammar,
with semanticactionsinJava
Beaver
48
tokens
tree
![Page 49: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/49.jpg)
Example beaver specification
49
%class "LangParser";%package "lang";...%terminalsLET,IN,END,ASSIGN,MUL,ID,NUMERAL;
%goal program;//Thestartsymbol
//Context-free grammarprogram=exp;exp =factor |exp MULfactor;factor =let |numeral |id;let =LETidASSIGNexp INexp END;numeral =NUMERAL;id=ID;
Lateron,we will extend this specification with semantic actionsto build thesyntaxtree.
![Page 50: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/50.jpg)
50
RE CFGTypicalAlphabet
characters terminalsymbols(tokens)
Language isasetof ...
strings(charsequences)
sentences(tokensequences)
Used for... tokens parsetreesPower iteration recursionRecognizer DFA DFAwith stack
RegularExpressionsvsContext-FreeGrammars
![Page 51: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/51.jpg)
51
Grammar Rulepatterns Typeregular X –>aY orX –>a orX –>e 3
contextfree X–> g 2context sensitive a X b –>a g b 1
arbitrary g –>d 0
TheChomskyhierarchy of formalgrammars
a – terminalsymbola, b, g, d – sequences of (terminalornonterminal)symbols
Type(3)⊂ Type (2)⊂ Type(1)⊂ Type(0)
Regular grammarshave thesamepower asregular expressions(tail recursion =iteration).
Type 2and3are of practicaluse incompiler construction.Type 0and1are only of theoretical interest.
![Page 52: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/52.jpg)
Courseoverview
Semantic analyzer
52
Lexical analyzer(scanner)
Syntactic analyzer(parser)
Regularexpressions
Context-freegrammar
Attributegrammar
tokens
sourcecode (text)
AST(Abstractsyntaxtree)
What we have covered:Context-free grammars,derivations,parse treesAmbiguous grammarsIntroduction to parsing,recursive-descent
You can now finishassignment 1
![Page 53: EDAN65: Compilers, Lecture03 Context-freegrammars ...fileadmin.cs.lth.se/cs/Education/EDAN65/2018/lectures/L03.pdfCourse overview Semanticanalyzer Intermediate codegenerator Optimizer](https://reader030.vdocuments.mx/reader030/viewer/2022040421/5e0eacddf37b4b6bae44f4e4/html5/thumbnails/53.jpg)
Summary questions
53
• Construct aCFGforasimplepartof aprogramming language.• What isanonterminal symbol?Aterminalsymbol?Aproduction?Astartsymbol?Aparse tree?• What isaleft-handside of aproduction?Aright-handside?• Givenagrammar G,what ismeant bythelanguage L(G)?• What isaderivationstep?Aderivation?Aleftmost derivation?Arighmostderivation?• How does aderivationcorrespond to aparse tree?• What does itmean foragrammar to beambiguous?Unambiguous?• Give anexample anambiguous CFG.• What isthedifference between anLLandanLRparser?• What isthedifference between LL(1)andLL(2)?Orbetween LR(1)andLR(2)?• Construct arecursive descent parserforasimplelanguage.• Give typical examples of grammarsthat cannot behandledbyarecursive-descent parser.• Explain why context-free grammarsare more powerful than regularexpressions.• Inwhat senseare context-free grammars"context-free"?