03-60-214 lexical analysis s a, b b ajlu.myweb.cs.uwindsor.ca/214/214lexer2019.pdf · 7 regular...
TRANSCRIPT
![Page 1: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/1.jpg)
1
03-60-214Lexicalanalysis S
a, b
a B
![Page 2: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/2.jpg)
2
Lexicalanalysisinperspec5ve
• LEXICALANALYZER– ScanInput– RemoveWhiteSpace,NewLine,…– Iden5fyTokens– CreateSymbolTable– InsertTokensintoSymbolTable– GenerateErrors– SendTokenstoParser
lexical analyzer parser
symbol table
source program
token
get next token
• PARSER– PerformSyntaxAnalysis
– Ac5onsDictatedbyTokenOrder
– UpdateSymbolTableEntries
– CreateAbstractRepresenta5onofSource
– GenerateErrors
• LEXICALANALYZER:Transformscharacterstreamtotokenstream– Alsocalledscanner,lexer,linearanalyzer
![Page 3: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/3.jpg)
3
Whereweare
Total=price+tax;
Total = price + tax ;
Lexical analyzer
Parser
id+id
Expr
assignment
=id
Regular expression
![Page 4: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/4.jpg)
4
Basicterminologiesinlexicalanalysis
• Token– Aclassifica5onforacommonsetofstrings– Examples:if,<identifier>, <number> …
• Pa[ern– Theruleswhichcharacterizethesetofstringsforatoken– RecallfileandOSwildcards(*.java)
• Lexeme– Actualsequenceofcharactersthatmatchespa[ernandisclassifiedby
atoken– Iden5fiers:if, price, 10.00,etc…
Regular expression
if (price + gst – rebate <= 10.00) gift := false
![Page 5: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/5.jpg)
5
Examplesoftoken,lexemeandpa[ernif (price + gst – rebate <= 10.00) gift := false
Token lexeme Informaldescrip5onofpa[ern
if if If
Lparen ( (
Iden5fier price Stringconsistsofle[ersandnumbersandstartswithale[er
operator + +
iden5fier gst Stringconsistsofle[ersandnumbersandstartswithale[er
operator - -
iden5fier rebate Stringconsistsofle[ersandnumbersandstartswithale[er
operator <= Lessthanorequalto
number 10.00 Anynumericconstant
rparen ) )
iden5fier gid Stringconsistsofle[ersandnumbersandstartswithale[er
operator := Assignmentsymbol
iden5fier false Stringconsistsofle[ersandnumbersandstartswithale[er
Regular expression
![Page 6: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/6.jpg)
6
Regularexpression• Scannerisbasedonregularexpression.• Rememberlanguageisasetofstrings.• Examplesofregularexpression
– Le[era a– Keywordif if– Allthele[ers a|b|c|...|z|A|B|C...|Z– Allthedigits 0|1|2|3|4|5|6|7|8|9– AlltheIden5fiers le[er(le[er|digit)*
• Basicopera5ons:– Setunion,e.g.,a|b– Concatena5on,e.g,ab– Kleeneclosure,e.g.,a*
Regular expression
![Page 7: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/7.jpg)
7
Regularexpression
• Regularexpression:construc5ngsequencesofsymbols(strings)fromanalphabet.
• LetΣbeanalphabet,raregularexpressionthenL(r)isthelanguagethatischaracterizedbytherulesofr
• Defini5onofregularexpression– εisaregularexpressionthatdenotesthelanguage{ε}
• Notethatitisnot{}– IfaisinΣ,aisaregularexpressionthatdenotes{a}– LetrandsberegularexpressionswithlanguagesL(r)andL(s).Then
• r|sisaregularexpressionàL(r)∪L(s)• rsisaregularexpressionàL(r)L(s)• r*isaregularexpressionà(L(r))*
• Itisaninduc5vedefini5on!• Dis5nc5onbetweenregularlanguageandregularexpression
Regular expression
![Page 8: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/8.jpg)
8
Formallanguageopera5ons
Opera5on Nota5on Defini5on ExampleL={a,b}M={0,1}
unionofLandM L∪M
L∪M={s|sisinLorsisinM}
{a,b,0,1}
concatena)onofLandM
LM
LM={st|sisinLandtisinM}
{a0,a1,b0,b1}
KleeneclosureofL
L*
L*denoteszeroormoreconcatena5onsofL
Allthestringsconsistsof“a”and“b”,plustheemptystring.{ε, a,aa,bb,ab,ba,aaa,…}
posi)veclosure L+
L+denotes“oneormoreconcatena5onsof“L
Allthestringsconsistsof“a”and“b”.
Regular expression
![Page 9: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/9.jpg)
9
Regularexpressionexamplerevisited
• Examplesofregularexpression– le[eràa|b|c|...|z|A|B|C...|Z– digità0|1|2|3|4|5|6|7|8|9– Iden5fieràle[er(le[er|digit)*
• Exercise:whyisitaregularexpression?
Regular expression
![Page 10: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/10.jpg)
10
Precedenceofoperators
• Can the following RE be simplified? (a) | ((b)*(c))
• *isofthehighestprecedence;• ts(Concatena5on)comesnext;• |lowest.
• Example– (a) | ((b)*(c))isequivalenttoa|b*c
Regular expression
![Page 11: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/11.jpg)
11
Proper5esofregularexpressions
Property Descrip5on
r|s=s|r |iscommuta5ve
r|(s|t)=(r|s)|t |isassocia5ve
(rs)t=r(st) Concatena5onisassocia5ve
r(s|t)=rs|rt(s|t)r=sr|tr
Concatena5ondistributesover|
......
What is why we can write
• either a|b or b|a
• a|b|c, or (a|b)|c
• abc, or (ab)c
Regular expression
![Page 12: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/12.jpg)
12
Nota5onalshorthandofregularexpression
• Oneormoreinstance– L+=LL*– L*=L+|ε – Example
• digitsà digit digit* • digitsàdigit+
• Zerooroneinstance– L?=L|ε– Example:
• Op5onal_frac5onà.digits|ε • op5onal_frac5onà(.digits)?
• Characterclasses– [abc]=a|b|c– [a-z]=a|b|c...|z
Regular expression
![Page 13: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/13.jpg)
REexample
• Stringsoflength5• Supposetheonlycharactersareaandb• Solu5on
(a|b)(a|b)(a|b)(a|b)(a|b)
• Simplifica5on(a|b){5}
13
![Page 14: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/14.jpg)
REexample
• Stringscontainingatmostonea b*ab* | b* Or b*(a|ε) b* | b*
• Moreconciserepresenta5onb*a?b*
14
![Page 15: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/15.jpg)
REforemailaddress
• [email protected]@[email protected]…le[er=[a-zA-Z]digit=[0-9]w=le[er|digitw+(.w+)*@w+.w+(.w+)*
15
![Page 16: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/16.jpg)
REforoddnumbers
Isthefollowingcorrect?[13579]+33343443[0-9]*[13579]
16
![Page 17: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/17.jpg)
17
Moreregularexpressionexample• REforrepresen5ngmonths
– Exampleoflegalinputs• Febcanberepresentedas02or2• Novemberisrepresentedas11
• Firsttry:(0|1)?[0-9]– Matchesalllegalinputs?Yes
• 1,2,11,12,01,02,...– Matchesnoillegalinputs?No
• 13,14,..etc• Secondtry:
(0|1)?[0-9]=(ε|(0|1))[0-9]=[0-9]|(0|1)[0-9]=[0-9]|(0[0-9]|1[0-9][0-9]|(0[0-9]|1[0-2]– Matchesalllegalinputs?Yes
• 1,2,11,12,01,02,...– Matchesnoillegalinputs?No
• 0,00
Regular expression
![Page 18: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/18.jpg)
18
Deriveregularexpressions
• Solu5on:[1-9]|(0[1-9])|(1[012])– Either1-9,or0followedby1to9,or1followedby0,1,or2.– Matchesalllegalinputs– Matchesnoillegalinputs
• Moreconcisesolu5on:0?[1-9]|1[012]– Isitequalto[1-9]|(0[1-9])|(1[012])?0?[1-9]|1[012]=(ε|0)[1-9]|1[012](byshorthandnota5on)=ε[1-9]|0[1-9]|1[012](bydistribu5onover|)=[1-9]|0[1-9]|1[012]
Regular expression
![Page 19: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/19.jpg)
19
Regularexpressionexample(realnumber)
• Realnumbersuchas0,1,2,3.14– Digit:[0-9]– Integer:[0-9]+– Firsttry:[0-9]+(.[0-9]+)?
• Wanttoallow“.25”aslegalinput?
– Secondtry:[0-9]+ | ([0-9]*.[0-9]+)
• Op5onalunaryminus:-?([0-9]+|([0-9]*.[0-9]+))-?(\d+|(\d*.\d+))
Regular expression
![Page 20: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/20.jpg)
20
Regularexpressionexercises
• Canthestringbaabecreatedfromtheregularexpressiona*b*a*b*?
• Describethelanguage(inwords)representedby(a*a)b|ab.a+b
• Writetheregularexpressionthatrepresents:– AllstringsoverΣ={a,b}thatendina.[ab]*a
(a|b)*a– AllstringsoverΣ={0,1}ofevenlength. ((0|1)(0|1))*(00|01|10|11)*
Regular expression
![Page 21: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/21.jpg)
21
Regulargrammarandregularexpression• Theyareequivalent
– Everyregularexpressioncanbeexpressedbyregulargrammar– Everyregulargrammarcanbeexpressedbyregularexpression– Differentwaystoexpressthesamething
• Whytwonota5ons– Grammarisamoregeneralconcept– REismoreconcise
• Howtotranslatebetweenthem– Useautomata– Willintroducelater
Regular expression
![Page 22: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/22.jpg)
22
Whatwelearntlastclass
• Defini5onofregularexpression– εisaregularexpressionthatdenotesthelanguage{ε}
• Notethatitisnot{}– IfaisinΣ,aisaregularexpressionthatdenotes{a}– LetrandsberegularexpressionswithlanguagesL(r)andL(s).Then
• (r)|(s)isaregularexpressionàL(r)∪L(s)• (r)(s)isaregularexpressionàL(r)L(s)• (r)*isaregularexpressionà(L(r))*
Regular expression
![Page 23: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/23.jpg)
23
Applica5onsofregularexpression
• InWindows– InwindowsyoucanuseREtosearchforfilesortextsinafile
• Inunix,therearemanyRErelevanttools,suchasGrep– StandsforGlobalRegularExpressionsandPrint(orGlobalRegular
ExpressionandParser…);– UsefulUNIXcommandtofindpa[ernsofcharactersinatextfile;
• XMLDTDcontentmodel– <!ELEMENTstudent(name,(phone|cell)*,address,course+)>
<student><name>Jianguo</name><phone>1234567</phone><phone>2345678</phone><address>401sunsetave</address><course>214</course></student>
• JavaCoreAPIhasregexpackage!• Scannergenera5on
Regular expression
![Page 24: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/24.jpg)
24
• REinXMLSchema<xsd:simpleTypename="TelephoneNumber"><xsd:restric5onbase="xsd:string"><xsd:lengthvalue="8"/><xsd:pa[ernvalue="\d{3}-\d{4}"/></xsd:restric5on></xsd:simpleType>
Regular expression
![Page 25: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/25.jpg)
25
RegularexpressionsusedinScanner,Stringetc
• Asampleproblem– Developaprogramthat,givenasinputthreepointsP1,P2,andP3onthecartesian
coordinateplane,reportswhetherP3liesonthelinecontainingP1andP2.– Inordertoinputthethreepoints,theuserenterssixintegersx1,y1,x2,y2,x3,andy3.
ThethreepointsareP1=(x1,y1),P2=(x2,y2),andP3=(x3,y3).– Theprogramshouldrepeatun5ltheuser'sinputissuchthatP1andP2arethesame
point.• Sampleinput
– Enterx1:0– Entery1:0– Enterx2:2– Entery2:5– Enterx3:1– Entery3:3
• Output– Thepoint(1,3)ISNOTonthelineconstructedfrompoints(2,5)and(0,0).
Regular expression in Java
![Page 26: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/26.jpg)
26
Howtoreadandprocesstheinput
• FirstTryScannersc=newScanner(System.in);intx1=sc.nextInt();java.u5l.InputMismatchExcep5on
• Wewanttocapturethevaluesforxandy,anddiscardeverythingelse
• DescribeeverythingelseasdelimitersScannersc=newScanner(System.in).useDelimiter(“…”);
• Thedelimiterscanbeanyregularexpression
SampleinputEnterx1:2Entery1:8Enterx2:0Entery2:0Enterx3:1Entery3:4
Regular expression in Java
![Page 27: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/27.jpg)
27
useDelimiterinScannerclass
• useDelimiter(“Enterx1:”)– Thiswillthrowaway“Enterx1:”only.Todiscard“Enterx2:”aswell,
youmaywanttoaddthefollowing
• useDelimiter(“Enterx1:|Enterx2:”)– Ver5calbarmeans“OR”—either“Enter x1:”OR“Enter x2:” – ItiscalledRegularexpression;– Nowyouknowhowtoexpandtothecaseforx3--
• useDelimiter(“Enterx1:|Enterx2:|Enterx3”)– Wecansimplifiedtheaboveusingothernota5onsinREGUALR
EXPRESSION.
Regular expression in Java
![Page 28: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/28.jpg)
28
Useregularexpressiontocapturetheinput
• useDelimiter(“Enterx\\d”)– Where\\dmeansanydigit.Nowhowtoreadinthevaluesforyaxis?
• useDelimiter(“Enter\\w\\d”)– \\wmeansanyle[erordigit
• useDelimiter(“Enter\\w{2}”)– Whatifthereareleadingandtrailingspacesaroundthesampleinput?
• useDelimiter(“(|\\t)Enter\\w{2}(|\\t)”)• useDelimiter(“\\sEnter\\w{2}\\s”)
– \\sstandsforallkindsofwhitespace.• useDelimiter(\\s*Enter\\w{2}:\\s*)
– *meansthatzeroormorespacescanoccur.
Regular expression in Java
![Page 29: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/29.jpg)
29
Codefragmenttoreaddata
//readfromfilefortes5ngpurposeScannersc=newScanner(newFile("online.txt")). useDelimiter("\\s*Enter\\w{2}:\\s*");
intx1=sc.nextInt();inty1=sc.nextInt();intx2=sc.nextInt();inty2=sc.nextInt();intx3=sc.nextInt();inty3=sc.nextInt();
Regular expression in Java
![Page 30: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/30.jpg)
30
RegexpackageinJava
• Javahasregularpackagejava.u5l.regex,• Asimpleexample:
– Pickoutthevaliddatesinastring– E.g.inthestring“finalexam2008-04-22,or2008-4-22,butnot
2008-22-04”– Validdates:2008-04-22,2008-4-22
• Firstweneedtowritetheregularexpressions.\d{4}-(0?[1-9]|1[012])-\d{2}
Regular expression in Java
![Page 31: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/31.jpg)
31
Regexpackage
• First,youmustcompilethepa[ernimport java.util.regex.*;Pattern p = Pattern.compile(“\\d{4}-(0?[1-9]|1[012])-\
\d{2}");– Notethatinjavayouneedtowrite\\dinsteadof\d
• Next,youmustcreateamatcherforaspecificpieceoftextbysendingamessagetoyourpa[ern– Matcherm=p.matcher(“…yourtextgoeshere….");
• Pointstono5ce:– Pa[ernandMatcherarebothinjava.u5l.regex– NeitherPa[ernnorMatcherhasapublicconstructor;youcreate
thesebyusingmethodsinthePa[ernclass– Thematchercontainsinforma5onaboutboththepa[erntouseand
thetexttowhichitwillbeapplied
Regular expression in Java
![Page 32: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/32.jpg)
32
Regexinjava
• Nowthatwehaveamatcherm,– m.matches()returnstrueifthepa[ernmatchestheen5retext
string,andfalseotherwise– m.lookingAt()returnstrueifthepa[ernmatchesatthe
beginningofthetextstring,andfalseotherwise– m.find()returnstrueifthepa[ernmatchesanypartofthetext
string,andfalseotherwise• Ifcalledagain,m.find()willstartsearchingfromwherethelastmatchwasfound
• m.find()willreturntrueforasmanymatchesasthereareinthestring;aderthat,itwillreturnfalse
• Whenm.find()returnsfalse,matchermwillberesettothebeginningofthetextstring(andmaybeusedagain)
Regular expression in java
![Page 33: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/33.jpg)
33
Regexexampleimportjava.u5l.regex.*;publicclassRegexTest{publicstaCcvoidmain(Stringargs[]){
Stringpa[ern="\\d{4}-(0?[1-9]|1[012])-\\d{2}";Stringtext="finalexam2008-04-22,or2008-4-22,butnot 2008-22-04";
Pa[ernp=Pa[ern.compile(pa[ern);Matcherm=p.matcher(text);while(m.find()){ System.out.println("validdate:"+text.substring(m.start(),m.end()));}}
}
Printout:– validdate:2008-04-22– validdate:2008-4-22
Regular expression in java
![Page 34: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/34.jpg)
34
Moreshorthandnota5oninspecifictools,likeregexpackageinJava
• Differentsodwaretoolshaveslightlydifferentnota5ons(e.g.regex,grep,JLEX);
• Shorthandnota5onsfromregexpackage– . anyonecharacterexceptalineterminator– \d adigit:[0-9]– \D anon-digit:[^0-9]– \s awhitespacecharacter:[\t\n\r]– \S anon-whitespacecharacter:[^\s]– \w awordcharacter:[a-zA-Z_0-9]– \W anon-wordcharacter:[^\w]
• GetfamiliarwithregularexpressionusingtheregexTesterApplet.
• NotethatStringclasssinceJava1.4providessimilarmethodsforregularexpression
Regular expression in java
![Page 35: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/35.jpg)
35
TryRegexTester
• Runningatcoursewebsiteasanapplet;– h[p://cs.uwindsor.ca/~jlu/214/regex_tester.htm
• Writeregularexpressionsandtrythematch(),find()methods;
• Whatif3q14insteadof3.14isthestringtobematched?Why?• groupsarenumberedbycoun5ngtheiropeningparenthesesfromledtoright.
– ((A)(B(C)))hasfourgroups:– 1((A)(B(C)))– 2(A)– 3(B(C))– 4(C)
Regular expression
![Page 36: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/36.jpg)
36
Prac5ceregularexpressionusinggrep
Usegreptosearchforcertainpa[erninhtmlfiles;– SearchforCanadianpostalcodeinatextfile;– SearchforOntariocarplatenumberinatextfile.
– usetcsh.Type• %tcsh
– Preparetextfile,say“test”,thatconsistsofsamplepostalcodeetc.– Type
• grep‘[a-z][0-9][a-z][0-9][a-z][0-9]’test
• grep–i‘[a-z][0-9][a-z][0-9][a-z][0-9]’test
Regular expression in unix
![Page 37: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/37.jpg)
37
Prac5cethefollowinggrepcommands:grep'cat'grepTest-youwillfindboth"cat"and"vaca5on”grep'\<cat\>'grepTest
--wordboundarygrep-i'\<cat\>'grepTest
– ignorethecasegrep'\<ega\.a[\.com\>'grepTest
-metacharactergrep'"[^"]*"'grepTest
--findquotedstringegrep'[a-z][0-9][a-z]?[0-9][a-z][0-9]'grepTest
– findpostalcode,onlyifitisinlowercaseegrep-i'[a-z][0-9][a-z]?[0-9][a-z][0-9]'grepTest
– ignorethecasegrep'q[^u]'grepTest
--won'tfind"iraq",butwillfind"iraqi"grep'^cat'grepTest
findonlylinesstartwithcat• egrepissimilartogrep,butsomepa[ernsonlyworkinegrep.
Regular expression in unix
megawatt computing ega.att.com Line with uppercase Cat and Dog. 214 cat vacation Cat Dog vacation only capital Cat the cat likes the dog "quoted string " iraq iraqi Qantas ADBB 123 ADBB12 N9B 3P5 N9B3P5 01-16-2004 14-14-2004
![Page 38: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/38.jpg)
38
Unixmachineaccount
• Applyforaunixaccount:– [email protected]
• Accessunixmachinesathome:– YouneedtouseSSH;– Oneplacetodownload:
• www.uwindsor.ca/its-->services/downloads
![Page 39: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/39.jpg)
39
REandFinitestateAutomaton(FA)• Regularexpressionisadeclara5vewaytodescribethetokens
– Itdescribeswhatisatoken,butnothowtorecognizethetoken.• FAisusedtodescribehowthetokenisrecognized
– FAiseasytobesimulatedbycomputerprograms;
• Thereisa1-1correspondencebetweenFAandregularexpression– Scannergenerator(suchasJLex)bridgesthegapbetweenregularexpression
andFA.
Scanner generator
FA RE Java scanner program
String stream
Tokens
![Page 40: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/40.jpg)
40
Insidescannergenerator
• Maincomponentsofscannergenera5on– REtoNFA– NFAtoDFA– Minimiza5on– DFAsimula5on
RE
NFA
DFA
Minimized DFA
Program
Thompson construction
Subset construction
DFA simulation Scanner generator
Minimization
![Page 41: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/41.jpg)
41
Finiteautomata
• FAalsocalledFiniteStateMachine(FSM)– Abstractmodelofacompu5ngen5ty;– Decideswhethertoacceptorrejectastring.
• TwotypesofFA:– Non-determinis5c(NFA):Hasmorethanonealterna5veac5on
forthesameinputsymbol.– Determinis5c(DFA):Hasatmostoneac5onforagiveninput
symbol.• Example:howdowewriteaprogramtorecognizejavaiden5fiers?
Start letter
letter
digit
s0 s1 S0:
if (getChar() is letter) goto S1;
S1: if (getChar() is letter or digit) goto S1;
Autom
ata
S
a, b a
B
Automaton:amechanismthatisrela5velyself-opera5ng--webster
![Page 42: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/42.jpg)
42
Non-determinis5cFiniteAutomata(FA)
• NFA(Non-determinis5cFiniteAutomaton)isa5-tuple(S,Σ, δ, S0, F):– S:asetofstates;– Σ:thesymbolsoftheinputalphabet;– δ:atransi5onfunc5on;
• move(state,symbol)àasetofstates
– S0:s0∈S,thestartstate;– F:F⊆S,asetoffinaloraccep5ngstates.
• Non-determinis5c--astateandsymbolpaircanbemappedtoasetofstates.– Itisdeterminis5ciftheresultoftransi5onconsistsofonlyonestate.
• Finite—thenumberofstatesisfinite.
Autom
ata
Start letter
letter
digit
s0 s1
![Page 43: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/43.jpg)
43
Transi5onDiagram
• FAcanberepresentedusingtransi5ondiagram.• CorrespondingtoFAdefini5on,atransi5ondiagramhas:
– States:Representedbycircles;– Σ: Alphabet,representedbylabelsonedges; – Moves:Representedbylabeleddirectededgesbetweenstates.Thelabel
istheinputsymbol;– StartState:arrowhead;– FinalState(s):representedbydoublecircles.
• Exampletransi5ondiagramtorecognize(a|b)*abb
q0 q3 a
q1 b
a, b
q2 b
Autom
ata
![Page 44: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/44.jpg)
44
SimpleexamplesofFA
• Epsilon
• a
• a*
• a+
• (a|b)*
start
a
0
start
a
1 a
0
start
a
0
b
start
a, b
0
start
1
a 0
start
1
ε 0
Autom
ata
![Page 45: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/45.jpg)
45
ProceduresofdefiningaDFA/NFA
• Defineinputalphabetandini5alstate• Drawthetransi5ondiagram• Check
– allstateshaveout-goingarcslabeledwithalltheinputsymbols(DFA).– Arethereanymissingfinalstates?– Arethereanyduplicatestates?– allstringsinthelanguagecanbeaccepted.– allstringsnotinthelanguagecannotbeaccepted.
• Nameallthestates• Define(S,Σ,δ,q0,F)
Autom
ata
![Page 46: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/46.jpg)
46
Exampleofconstruc5ngaFA
• ConstructaDFAthatacceptsalanguageLoverΣ={0,1}suchthatListhesetofallstringswithanynumberof“0”sfollowedbyanynumberof“1”s.
• Regularexpression:0*1*• Σ={0,1}• Drawini5alstateofthetransi5ondiagram
Start
Autom
ata
![Page 47: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/47.jpg)
47
Exampleofconstruc5ngaFA(cont.)
• Dradthetransi5ondiagram
Start 1
0 1
0
• Is“111”accepted?• Theledmoststatehasmissedanarcwithinput“1”
Start 1
0 1
0
1
Autom
ata
• Is“00”accepted?
![Page 48: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/48.jpg)
48
Exampleofconstruc5ngaFA(cont.)
• Is“00”accepted?• Theledmosttwostatesarealsofinalstates
– Firststatefromtheled:εisalsoaccepted– Secondstatefromtheled:
stringswith“0”sonlyarealsoaccepted
Start 1 0 1
0
1
Autom
ata
![Page 49: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/49.jpg)
49
Exampleofconstruc5ngaFA(cont.)
• Theledmosttwostatesareduplicate– theirarcspointtothesamestateswiththesamesymbols
Start 1
0 1
• Checkthattheyarecorrect– Allstringsinthelanguagecanbeaccepted
• εisaccepted• stringswith“0”s/“1”sonlyareaccepted
– Allstringsnotbelongedtothelanguagecannotbeaccepted• Nameallthestates
Start 1
0 1
q0 q1
Autom
ata
![Page 50: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/50.jpg)
50
HowdoesFAwork
• NFAdefini5onfor(a|b)*abb– S={q0,q1,q2,q3}– Σ={a,b}– Transi5ons:move(q0,a)={q0,q1},move(q0,b)={q0},....– s0=q0– F={q3}
• Transi5ondiagramrepresenta5on– Non-determinism:
• exi5ngfromonestatetherearemul5pleedgeslabeledwithsamesymbol,or• Thereareepsilonedges.
– HowdoesFAwork?Input:ababb
move(q0,a)=q1move(q1,b)=q2move(q2,a)=?(undefined)REJECT!
move(q0, a) = q0 move(q0, b) = q0 move(q0, a) = q1 move(q1, b) = q2 move(q2, b) = q3 ACCEPT !
q0 q3 a
q1 b
a,b
q2 b
Autom
ata
![Page 51: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/51.jpg)
51
FAfor(a|b)*abb
– WhatdoesitmeanthatastringisacceptedbyaFA?• AnFAacceptsaninputstringxiffthereisapathfromthestartstatetoafinalstate,suchthattheedgelabelsalongthispathspelloutx;
– Apathfor“aabb”:q0àaq0àaq1àbq2àbq3– Is“aab”acceptable?
q0àaq0àaq1àbq2q0àaq0àaq0àbq0
• Theanswerisno;• Finalstatemustbereached;• Ingeneral,therecouldbeseveralpaths.
– Is“aabbb”acceptable?q0àaq0àaq1àbq2àbq3
• Theanswerisno.• Labelsonthepathmustspellouttheen5restring.
q0 q3 a
q1 b
a,b
q2 b
Autom
ata
![Page 52: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/52.jpg)
52
ExampleofNFAwithepsilonsymbol
• NFAaccep5ngaa*|bb* – Is“aaab”acceptable?– Is“aaa”acceptable?
0
1
3 b
b
4 ε
ε
a
a
2
Autom
ata
![Page 53: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/53.jpg)
Simula5nganNFA
• Keeptrackofasetofstates,ini5allythestartstateandeverythingreachablebyε-moves.
• Foreachcharacterintheinput:– Maintainasetofnextstates,ini5allyempty.
• Foreachcurrentstate:– Followalltransi5onslabeledwiththecurrentle[er.– Addthesestatestothesetofnewstates.
• Addeverystatereachablebyanε-movetothesetofnextstates.
• Complexity:O(mn2)forstringsoflengthmandautomatawithnstates
53
q0 q3 a
q1 b
a,b
q2 b
![Page 54: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/54.jpg)
54
Transi5ontable
• Itisoneofthewaystoimplementthetransi5onfunc5on– Thereisarowforeachstate;– Thereisacolumnforeachsymbol;– Entryin(states,symbola)isthesetofstatescanbereachedfromstateson
inputa.
• Nondeterminis5c:– Theentriesaresetsinsteadofasinglestate
States Input
a b
>q0 {q0,q1} {q0}
q1 {q2}
q2 {q3}
*q3
Autom
ata
q0 q3 a
q1 b
a,b
q2 b
![Page 55: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/55.jpg)
55
DFA(Determinis5cFiniteAutomaton)• AspecialcaseofNFA
– Thetransi5onfunc5onmapsthepair(state,symbol)toonestate.• Whenrepresentedbytransi5ondiagram,foreachstateSandsymbola,there
isatmostoneedgelabeledaleavingS;• Whenrepresentedtransi5ontable,eachentryinthetableisasinglestate.
– Thereisnoε-transi5on• Example:DFAfor(a|b)*abb
• RecalltheNFA:
States Input
a b
Q0 Q1 q0
Q1 Q1 Q2
Q2 Q1 Q3
Q3 Q1 Q0
q0
q3
a
a
q1 q2
a b b
b
a b
q0 q3 a
q1 b
a,b
q2 b
Autom
ata
![Page 56: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/56.jpg)
56
DFAtoprogram
• NFAismoreconcise,butnoteasytoimplement;
• InDFA,sincetransi5ontablesdon’thaveanyalterna5veop5ons,DFAsareeasilysimulatedviaanalgorithm.
RE
NFA
DFA
Minimized DFA
Program
Thompson construction
Subset construction
DFA simulation Scanner generator
Minimization
Autom
ata simulation
![Page 57: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/57.jpg)
57
SimulateaDFA
• AlgorithmtosimulateDFAInput:Stringx,DFAD.
• Transi5onfunc5onismove(s,c);• StartstateisS0;• FinalstatesareF.
Output:“yes”ifDacceptsx;“no”otherwise;Algorithm:
currentState ← s0 currentChar ← nextchar; while currentChar ≠ eof { currentState ← move(currentState, currentChar); currentChar ← nextchar; } if currentState is in F then return “yes” else return “no”
• RuntheFAsimulator!• Writeasimulator.
Autom
ata simulation
![Page 58: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/58.jpg)
58
NFAtoDFA
• Whereweare:wearegoingtodiscussthetransla5onfromNFAtoDFA.
• Theorem:AlanguageLisacceptedbyanNFAiffitisacceptedbyaDFA
• Subsetconstruc5onisusedtoperformthetransla5onfromNFAtoDFA.
RE
NFA
DFA
Minimized DFA
Program
Thompson construction
Subset construction
DFA simulation Scanner generator
Minimization
NFA to D
FA
![Page 59: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/59.jpg)
59
Summarize
• Wehavecoveredmanyconcepts– RE,Regulargrammar,FA(NFA,DFA),
Transi5onDiagram,Transi5onTable.• Whatistherela5onshipbetweenthem?
– RE,Regulargrammar,NFA,DFA,Transi5onDiagramareallofthesameexpressivepower;
– REisadeclara5vedescrip5on,henceeasierforustowrite;
– DFAisclosertomachine;– Transi5onDiagramisagraphic
representa5onofFA;– Transi5onTableisoneofthemethodsto
implementthetransi5onfunc5onsinFA.• Whataboutregulargrammar?
– Wewillseeitsrelevanceinsyntaxanalysis.• Anotherpath:howtoderiveREfromDFA?
RE
NFA
DFA
Minimized DFA
Program
Thompson construction
Subset construction
DFA simulation
![Page 60: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/60.jpg)
60
Regularexpressionandregulargrammar
• REtoregulargrammar:– DrawanautomatafortheRE– Constructregulargrammarfromautomaton– Example
S0àle[erS1S1àle[erS1S1àdigitS1S1àε
Start letter
letter
digit
s0 s1
![Page 61: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/61.jpg)
61
Conver5ngDFAstoREs
1. Combineseriallinksbyconcatena5on2. Combineparallellinksbyalterna5on3. Removeself-loopsbyKleeneclosure4. Selectanode(otherthanini5alorfinal)forremoval.Replace
itwithasetofequivalentlinkswhosepathexpressionscorrespondtotheinandoutlinks
5. Repeatsteps1-4un5lthegraphconsistsofasinglelinkbetweentheentryandexitnodes.
![Page 62: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/62.jpg)
62
Example
0 1 2
6
4 3 d
a
b c
d
7
5 a
b d
d
b
c
0 1 2
6
4 3 d a|b|c d
7
5 a
b d
d
b|c
0 4 3 d(a|b|c)d
5 a d
b(b|c)d
![Page 63: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/63.jpg)
63
Example(cont.)
0 4 3 d(a|b|c)d
5 a d
b(b|c)da
0 4 3 d(a|b|c)d
5 a (b(b|c)da)*d
0 d(a|b|c)da(b(b|c)da)*d
5
![Page 64: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/64.jpg)
64
Issuesnotcovered
• RegularexpressiontoDFAdirectly;
RE
NFA
DFA
Minimized DFA
Program
Thompson construction
Subset construction
DFA simulation
![Page 65: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/65.jpg)
65
LexicalacceptorsandLexicalanalyzers
• DFA/NFAacceptsorrejectsastring;– Theyarecalledlexicalacceptors;
• Butthepurposeofalexicalanalyzerisnotjusttoacceptorrejectstring.Thereareseveralissues:– Mul)plematches:Oneregularexpressionmaymatchseveral
substrings.• e.g.,ID=le[er+,String=“abc”,IDcanmatchwitha, ab, abc.• Weshouldfindthelongestmatches,i.e.,longestsubstringoftheinputthatmatchestheregularexpression;
– Mul)pleREs:WhatifonestringcanmatchseveralREs?• e.g.,ID=le[er+,INT=“int”,• String“int”canbebothareservedwordINT,andaniden5fier.Howcanwedecideitisareservedwordinsteadanusualiden5fier?
– Ac)ons:Onceatokenisrecognized,wewanttoperformdifferenttasksonthem,insteadofsimplyreturnthestringrecognized.
![Page 66: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/66.jpg)
66
Longestmatch
• WhenseveralsubstringscanmatchthesameRE,weshouldreturnthelongestone.– e.g.,ID=le[er+,String=“abc”,IDcanmatchwitha,ab,abc.
• Problem:whatifalexergoespastafinalstateofashortertoken,butthendoesn’tfindanyothermatchingtokenlater?
• Example:ConsiderR=00|10|0011andinputw=0010.
• WereachstateCwithnotransi5ononinput0.• Solu5on:Keepingtrackofthelongestmatchjustmeansrememberingthelast5metheDFAwasinafinalstate;
S 0 1
0
0
1
1A B
F
C
E
D
![Page 67: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/67.jpg)
67
Longestmatch(cont.)
• Thisisdonebyintroducingthefollowingvariables:– LastFinal:finalstatemostrecentlyencountered;– InpputPosi5onAtLastFinal:mostrecentposi5onintheinputstringinwhich
theexecu5onoftheDFAwasinafinalstate;– Yytext:Textofthetokenbeingmatched,i.e.,substringbetween
ini5alInputPosi5onandinputPosi5onAtLastFinal.• Thiswayalongestmatchisrecognizedwhentheexecu5onoftheDFAreachesadead-end,i.e.,astatewithnotransi5ons.
• Each5meatokenisrecognized,theexecu5onoftheDFAresumesintheini5alstatetorecognizethenexttoken.
• Ingeneral,whenatokenisrecognized,currentInputPosi5onmaybefarbeyondinputPosi5onAtLastFinal.
S 0 1
0
0
1
1A B
F
C
E
D
![Page 68: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/68.jpg)
68
Handlingmul5pleREs
• CombinetheNFAsofalltheREsintoasinglefiniteautomaton.• WhatiftwoREsmatchesthesamestring?
– E.g.,forastring“abb”,bothREs“a*bb”and“ab*”matchesthestring.WhichREisintended?
– Itisimportantbecausedifferentac5onsmaytakedependingontheREbeingmatched;
– Solu5on:OrderREs:theREprecedeswillmatchfirst.
• Howaboutreservedwords?– Forstring“int”,shouldwereturntokenINTortokenID?– Twosolu5ons:
• Constructareservedwordtableandlookupthetableevery5meaniden5fierisencountered;
• Put“int”asanRE,andputthatREbeforetheiden5fierRE.Sowheneverthestring“int”ismet,RE“int”willbematchedfirstandthetokenINTwillbereturned(insteadofthetokenID).
![Page 69: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/69.jpg)
69
Ac5ons
• Ac5onscanbeaddedforfinalstates;• Ac5onscanbedescribedinausualprogramminglanguage.InJLex,ac5onisdescribedinJava.
![Page 70: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/70.jpg)
70
Buildascannerforasimplelanguage
• Thelanguageofassignmentstatements: LHS=RHS intLHS=RHS
…– led-handsideofassignmentisaniden5fier,withop5onaltype
declara5on;– Iden5fierisale[erfollowedbyoneormorele[ersordigits– right-handsideisoneofthefollowing:
• ID+ID• ID*ID• ID==ID
• Examplestatementintx3=x1+x2
![Page 71: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/71.jpg)
71
Step1:Definetokens
Ourlanguagehassixtypeoftokens.– theycanbedefinedbysixregularexpressions:
token Regularexpression
ASSIGN "="
ID le[er(le[er|digit)*
INT “int”
PLUS "+"
TIMES "*"
EQUALS "=="
![Page 72: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/72.jpg)
72
Step2:ConvertREstoNFAs
“=”
letter
“+”
“*”
“=”
ASSIGN:
ID:
PLUS:
TIMES:
EQUALS:
Letter, digit
“=”
Step 3: Combine the NFAs, Convert NFAs to DFAs, minimize the DFAs
INT: “n” “i” “t”
ε
![Page 73: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/73.jpg)
73
Step4:ExtendtheDFA• ModifytheDFAsothatafinalstatecanhave
– anassociatedac5on,suchas:"putbackonecharacter"or"returntokenXXX“.
• Forexample,theDFAthatrecognizesiden5fierscanbemodifiedasfollows– recallthatscanneriscalledbyaparser(onetokenisreturnedpereach
call)– henceac5onreturnputsthescannerintostateS
letter
any char except letter or digit
letter | digit action:
• put back 1 char • return ID
S
![Page 74: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/74.jpg)
74
Step5:CombinedFAforourlanguage• combine the DFAs for all of the tokens in to a single FA.
letter any char except letter or digit
letter | digit put back 1 char; return ID
S “*” ID
return EQUALS “=”
any char except “=”
put back 1 char; return ASSIGN
return TIMES
“+”
return PLUS
“=”
F1
F4
F3
F2
TMP F5
“n” “i”
“t”
SP return INT, put back one char
F6
I1 I2
I3
• It is not a DFA. Just for illustration purpose.
F7 SP
![Page 75: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/75.jpg)
75
Exampletracefor“intx3=x1+x2”
Input Lastfinalstate
Currentstate
InputposiConatlastFinalState
CurrentinputposiCon
IniCalinputposiCon
acCon
intx3=x1+x2 0 S 0 0 0
ntx3=x1+x2 0 I1 1 1 0
tx3=x1+x2 0 I2 2 2 0
[sp]x3=x1+x2 0 I3 3 3 0
x3=x1+x2 0 F6 4 4 0 Ac5on1
putBackOneChar();Yytext=substring(0,4-1);in5alInputPosi5on=3;currentSate=S;
[sp]x3=x1+x2 0 S 3 3 3
x3=x1+x2 0 SP 4 4 3 Ac5on2
Yytext=substring(3,3);ini5alInputPosi5on=4;currentState=S;
x3=x1+x2 0 S 4 4 4
3=x1+x2 0 ID 5 5 4
=x1+x2 0 ID 6 6 4
x1+x2 0 F2 7 7 4 Ac5on3
putBackOneChar;Yytext=substring(4,7-1);ini5alInputPosi5on=6;currentState=S;
=x1+x2 0 S 6 6 6
......
![Page 76: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/76.jpg)
76
Scannergenerator:history
• LEX– Alexicalanalyzergenerator,wri[enbyLeskandSchmidtatBellLabs
in1975fortheUNIXopera5ngsystem;– Itnowexistsformanyopera5ngsystems;– LEXproducesascannerwhichisaCprogram;– LEXacceptsregularexpressionsandallowsac5ons(i.e.,codeto
executed)tobeassociatedwitheachregularexpression.
• JLex– Lexthatgeneratesascannerwri[eninJava;– ItselfisalsoimplementedinJava.
• Therearemanysimilartools,formostprogramminglanguages
JLex
![Page 77: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/77.jpg)
77
Overallpicture
Tokens
Scanner generator
NFA RE Java scanner program
String stream
DFA
Minimize DFA
Simulate DFA
JLex
![Page 78: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/78.jpg)
78
Insidelexicalanalyzergenerator
• Howdoesalexicalanalyzerwork?– Getinputfromuserwhodefines
tokensintheformthatisequivalenttoregulargrammar
– TurntheregulargrammarintoaNFA
– ConverttheNFAintoDFA– Generatethecodethatsimulates
theDFA
Classes in JLex:
CAccept CAcceptAnchor CAlloc CBunch CDfa CDTrans CEmit CError CInput CLexGen CMakeNfa CMinimize CNfa CNfa2Dfa CNfaPair CSet CSimplifyNfa CSpec CUtility Main SparseBitSet ucsb
JLex
![Page 79: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/79.jpg)
79
Howscannergeneratorisused
• Writethescannerspecifica5on;• Generatethescannerprogramusingscannergenerator;• Compilethescannerprogram;• Runthescannerprogramoninputstreams,andproducesequencesoftokens.
Scanner generator (e.g., JLex)
Scanner definition (e.g., JLex spec)
Scanner program, e.g., Scanner.java
Java compiler
Scanner program (e.g., Scanner.java)
Scanner.class
Scanner Scanner.class Input stream Sequence of
tokens
JLex
![Page 80: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/80.jpg)
80
JLexspecifica5on
• JLexspecifica5onconsistsofthreeparts,separatedby“%%”
UserJavacode,tobecopiedverba5mintothescannerprogram,placedbeforethelexerclass;
%%JLexdirec5ves,macrodefini5ons,commonlyusedtospecifyle[ers,digits,whitespace;%%Regularexpressionsandac5ons:
• Specifyhowtodivideinputintotokens;• Regularexpressionsarefollowedbyac5ons;
– Printerrormessages;returntokencodes;
JLex
![Page 81: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/81.jpg)
81
FirstJLexexamplesimple.lex
• Recognizeintandiden5fiers.1. %%2. %{publicsta5cvoidmain(Stringargv[])throwsjava.io.IOExcep5on{3. MyLexeryy=newMyLexer(System.in);4. while(true){5. yy.yylex();6. }7. }8. %}9. %notunix10. %typevoid11. %classMyLexer12. %eofval{return;13. %eofval}
14. IDENTIFIER=[a-zA-Z][a-zA-Z0-9]*
15. %%
16. "int"{System.out.println("INTrecognized");}17. {IDENTIFIER}{System.out.println("IDis..."+yytext());}18. \r|\n{}19. .{}
JLex
![Page 82: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/82.jpg)
82
Codegeneratedwillbeinsimple.lex.java
classMyLexer{publicsta5cvoidmain(Stringargv[])throwsjava.io.IOExcep5on{
MyLexeryy=newMyLexer(System.in);while(true){yy.yylex();
}}publicvoidyylex(){......case5:{System.out.println("INTrecognized");}case7:{System.out.println("IDis..."+yytext());}......}}
Copied from internal code directive
JLex
![Page 83: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/83.jpg)
83
RunningtheJLexexample• StepstoruntheJLex
D:\214>javaJLex.Mainsimple.lexProcessingfirstsec5on--usercode.Processingsecondsec5on--JLexdeclara5ons.Processingthirdsec5on--lexicalrules.Crea5ngNFAmachinerepresenta5on.NFAcomprisedof22states.Workingoncharacterclasses.::::::::.NFAhas10dis5nctcharacterclasses.Crea5ngDFAtransi5ontable.WorkingonDFAstates...........MinimizingDFAtransi5ontable.9statesaderremovalofredundantstates.Outpu�nglexicalanalyzercode.D:\214>movesimple.lex.javaMyLexer.javaD:\214>javacMyLexer.javaD:\214>javaMyLexer//itiswai5ngforkeyboardinputintmyid0INTrecognizedIDis...myid0
JLex
![Page 84: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/84.jpg)
84
Exercises
• TrytomodifyJLexdirec5vesinthepreviousJLexspec,andobservewhetheritiss5llworking.Ifitisnotworking,trytounderstandthereason.– Remove“%notunix”direc5ve;– Change“return;”to“returnnull;”;– Remove“%typevoid”;– ......
• MovetheIden5fierregularexpressionbeforethe“int”RE.Whatwillhappentotheinput“int”?
• Whatifyouremovethelastline(line19,“.{}”)?
JLex
![Page 85: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/85.jpg)
85
Changesimple.lex:readinputfromfile
1. importjava.io.*;2. %%3. %{publicsta5cvoidmain(Stringargv[])throwsjava.io.IOExcep5on{4. MyLexeryy=newMyLexer(newFileReader(“input”));5. while(yy.yylex()>=0);6. }7. %}8. %integer9. %classMyLexer
10. %%
11. "int"{System.out.println("INTrecognized");}12. [a-zA-Z_][a-zA-Z0-9_]*{System.out.println("IDis..."+yytext());}13. \r|\n|.{}
• %integer:tomakethereturningtypeofyylex()asint.
JLex
![Page 86: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/86.jpg)
86
Extendtheexample:addreturninganduseclasses– Whenatokenisrecognized,inmostofthecasewewanttoreturnatokenobject,sothat
otherprogramscanuseit.classUseLexer{publicsta5cvoidmain(String[]args)throwsjava.io.IOExcep5on{Tokent;MyLexer2lexer=newMyLexer2(System.in);while((t=lexer.yylex())!=null)System.out.println(t.toString());}}classToken{Stringtype;Stringtext;intline;Token(Stringt,Stringtxt,intl){type=t;text=txt;line=l;}publicStringtoString(){returntext+""+type+""+line;}}%%%notunix%line%typeToken%classMyLexer2%eofval{returnnull;%eofval}IDENTIFIER=[a-zA-Z_][a-zA-Z0-9_]*%%"int"{return(newToken("INT",yytext(),yyline));}{IDENTIFIER}{return(newToken("ID",yytext(),yyline));}\r|\n{}.{}
JLex
![Page 87: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/87.jpg)
87
Codegeneratedfrommylexer2.lex
classUseLexer{publicsta5cvoidmain(String[]args)throwsjava.io.IOExcep5on{Tokent;MyLexer2lexer=newMyLexer2(System.in);while((t=lexer.yylex())!=null)System.out.println(t.toString());}}classToken{Stringtype;Stringtext;intline;Token(Stringt,Stringtxt,intl){type=t;text=txt;line=l;}publicStringtoString(){returntext+""+type+""+line;}}ClassMyLexer2{publicTokenyylex(){......case5:{return(newToken("INT",yytext(),yyline));}case7:{return(newToken("ID",yytext(),yyline));}
......}}
JLex
![Page 88: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/88.jpg)
88
Runningtheextendedlexspecifica5onmylexer2.lex
D:\214>javaJLex.Mainmylexer2.lexProcessingfirstsec5on--usercode.Processingsecondsec5on--JLexdeclara5ons.Processingthirdsec5on--lexicalrules.Crea5ngNFAmachinerepresenta5on.NFAcomprisedof22states.Workingoncharacterclasses.::::::::.NFAhas10dis5nctcharacterclasses.Crea5ngDFAtransi5ontable.WorkingonDFAstates...........MinimizingDFAtransi5ontable.9statesaderremovalofredundantstates.Outpu�nglexicalanalyzercode.D:\214>movemylexer2.lex.javaMyLexer2.javaD:\214>javacMyLexer2.javaD:\214>javaUseLexerintintINT0x1x1ID1
JLex
![Page 89: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/89.jpg)
89
Anotherexample
1importjava.io.IOExcep5on;2%%3%public4%classNumbers_15%typevoid6%eofval{return;8%eofval}910%line11%{publicsta5cvoidmain(Stringargs[]){12Numbers_1num=newNumbers_1(System.in);13try{14num.yylex();15}catch(IOExcep5one){System.err.println(e);}16}17%}1819%%20\r\n{System.out.println("---"+(yyline+1));}22.*\r\n{System.out.print("+++"+(yyline+1)+"\t"+yytext());}
JLex
![Page 90: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/90.jpg)
90
Usercode(firstsec5onofJLex)
• Usercodeiscopiedverba5mintothelexicalanalyzersourcefilethatJLexoutputs,atthetopofthefile.– Packagedeclara5ons;– Importsofanexternalclass– Classdefini5ons
• Generatedcodepackage declarations; import packages; Class definitions; class Yylex { ... ... }
• Yylexclassisthedefaultlexerclassname.Itcanbechangedtootherclassnameusing%classdirec5ve.
JLex
![Page 91: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/91.jpg)
91
JLexdirec5ves(Secondsec5on)
• Internalcodetolexicalanalyzerclass• Marcodefini5on• Statedeclara5on• Character/linecoun5ng• Lexicalanalyzercomponent5tle• Specifyingthereturnvalueonend-of-file• Specifyinganinterfacetoimplement
JLex
![Page 92: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/92.jpg)
92
InternalCodetoLexicalAnalyzerClass
• %{ …. %} direc5vepermitsthedeclara5onofvariablesandfunc5onsinternaltothegeneratedlexicalanalyzer
• Generalform:%{<code>%}
• Effect:<code>willbecopiedintotheLexerclass,suchasMyLexer.classMyLexer{…..<code>……}
• Examplepublicsta5cvoidmain(Stringargv[])throwsjava.io.IOExcep5on{
MyLexeryy=newMyLexer(System.in);while(true){yy.yylex();}}
• Differencewiththeusercodesec5on– Itiscopiedinsidethelexerclass(e.g.,theMyLexerclass)
JLex
![Page 93: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/93.jpg)
93
MacroDefini5on• Purpose:defineonceandusedseveral5mes;
– Amustwhenwewritelargelexspecifica5on.• Generalformofmacrodefini5on:
– <name>=<defini5on>– shouldbecontainedonasingleline– Macronameshouldbevalididen5fiers– Macrodefini5onshouldbevalidregularexpressions– Macrodefini5oncancontainothermacroexpansions,inthestandard
{<name>}formatformacroswithinregularexpressions.• Example
– Defini5on(inthesecondpartofJLexspec):IDENTIFIER=[a-zA-Z_][a-zA-Z0-9_]*ALPHA=[A-Za-z_]DIGIT=[0-9]ALPHA_NUMERIC={ALPHA}|{DIGIT}
– Use(inthethirdpart):{IDENTIFIER}{returnnewToken(ID,yytext());}
JLex
![Page 94: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/94.jpg)
94
Statedirec5ve• Samestringcouldbematchedbydifferentregularexpressions,accordingtoitssurroundingenvironment.– String“int”insidecommentshouldnotberecognizedasareservedword,not
evenasaniden5fier.
• Par5cularlyusefulwhenyouneedtoanalyzemixedlanguages;• Forexample,inJSP,JavaprogramscanbeimbeddedinsideHTMLblocks.OnceyouareinsideJavablock,youfollowtheJavasyntax.ButwhenyouareoutoftheJavablock,youneedtofollowtheHTMLsyntax.– Injava“int”shouldberecognizedasareservedword;– InHTML“int”shouldberecognizedjustasausualstring.
• StatesinsideJLex<HTMLState>%{{yybegin(JavaState);}<HTMLState>“int”{returnstring;}<JavaState>%}{yybegin(HTMLState);}<JavaState>“int”{returnkeyword;}
JLex
![Page 95: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/95.jpg)
95
StateDirec5ve(cont.)• MechanismtomixFAstatesandREs• Declaringasetof“startstates”(inthesecondpartofJLexspec)
%statestate0[,state1,state2,….]• Howtousethestate(inthethirdpartofJLexspec):
– REcanbeprefixedbythesetofstartstatesinwhichitisvalid;• Wecanmakeatransi5onfromonestatetoanotherwithinputRE
– yybegin(STATE)isthecommandtomaketransi5ontoSTATE;• YYINITIAL:implicitstartstateofyylex();
– Butwecanchangethestartstate;• Example(fromthesampleinJLexspec):
%state COMMENT %% <YYINITIAL>if {return new tok(sym.IF,”IF”);} <YYINITIAL>[a-z]+ {return new tok(sym.ID, yytext());} <YYINITIAL>”/*” {yybegin(COMMENT);} <COMMENT>”*/” {yybegin(YYINITIAL);} <COMMENT>. {}
JLex
![Page 96: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/96.jpg)
96
Characterandlinecoun5ng
• Some5mesitisusefultoknowwhereexactlythetokenisinthetext.Tokenposi5onisimplementedusinglinecoun5ngandcharcoun5ng.
• Charactercoun5ngisturnedoffbydefault,ac5vatedwiththedirec5ve“%char”– Createaninstancevariableyycharinthescanner;– zero-basedcharacterindexofthefirstcharacteronthematchedregionof
text.
• Linecoun5ngisturnedoffbydefault,ac5vatedwiththedirec5ve“%line”– Createaninstancevariableyylineinthescanner;– zero-basedlineindexatthebeginningofthematchedregionoftext.
• Example“int”{return(newYytoken(4,yytext(),yyline,yychar,yychar+3));}
JLex
![Page 97: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/97.jpg)
97
Lexicalanalyzercomponent5tles
• Changethenameofgenerated– lexicalanalyzerclass %class<name>– thetokenizingfunc5on %func5on<name>– thetokenreturntype %type<name>
• DefaultnamesclassYylex{/*lexicalanalyzerclass*/publicYytoken/*thetokenreturntype*/yylex(){…}/*thetokenizingfunc5on*/==>Yylex.yylex() returns Yytoken type
JLex
![Page 98: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/98.jpg)
98
SpecifyinganInterfacetoimplement
• Form:%implements<InterfaceName>• AllowstheusertospecifyaninterfacewhichtheYylexoryourlexerclasswillimplement.
• Thegeneratedparserclassdeclara5onwilllooklike:class MyLexer implementsInterfaceName{…….}
JLex
![Page 99: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/99.jpg)
99
Regularexpressionrules
• Generalform:regularExpression{ac5on}
• Example:{IDENTIFIER}{System.out.println("IDis..."+yytext());}
• Interpreta5on:Pa[entobematchedcodetobeexecutedwhenthepa[ernismatched
• CodegeneratedinMyLexer:“case2: {System.out.println("IDis..."+yytext());}“
JLex
![Page 100: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/100.jpg)
100
RegularExpressionRules• Specifiesrulesforbreakingtheinputstreamintotokens• RegularExpression+Ac5ons(javacode)[<states>]<expression>{<ac5on>}• Whenmatchedwithmorethanonerule,
– choosetherulethatisgivenfirstintheJlexspec.• Referthe“int”andIDENTIFIERexample.
• TherulesgiveninaJLexspecifica5onshouldmatchallpossibleinput.• Anerrorwillberaisedifthegeneratedlexerreceivesinputthatdoesnotmatchanyofitsrules
– E.g.,therulesonlylistedthecaseforIden5fiers,andsaidnothingaboutnumbers,butyourinputhasnumbers.
– Thisisthemostcommonerror(morethan50%)– putthefollowingruleatthebo[omofREspec
.{java.lang.System.out.println(“Error:” + yytext());} dot(.)willmatchanyinputexceptforthenewline.
JLex
![Page 101: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/101.jpg)
101
Availablelexicalvalueswithinac5oncode
• java.lang.Stringyytext()– matchespor5onofthecharacterinputstream;– alwaysac5ve.
• Intyychar– Zero-basedcharacterindexofthefirstcharacterinthematched
por5onoftheinputstream;– ac5vatedby%chardirec5ve.
• Intyyline– Zero-basedlinenumberofthestartofthematchedpor5onofthe
inputstream;– ac5vatedby%linedirec5ve.
JLex
![Page 102: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/102.jpg)
102
RegularexpressioninJLex•Specialcharacters:?+|()ˆ$/;.=<>[]{}”\andblank
– Ader\thespecialcharacterslosetheirspecialmeaning.– Example:\+
• Betweendoublequotes”allspecialcharactersbut\and”losetheirspecialmeaning.– Example:”+”
• Thefollowingescapesequencesarerecognized:\b\n\t\f\r.• With[]wecandescribesetsofcharacters.
– [abc]isthesameas(a|b|c).Notethatitisnotequivalenttoabc– With[ˆ]wecandescribesetsofcharacters.– [ˆ\n\”]meansanythingbutanewlineorquotes– [ˆa–z]meansanythingbutONElower-casele[er
• Wecanuse.asashortcutfor[ˆ\n]• $: denotestheendofaline.If$endsaregularexpression,theexpressionmatchedonlyattheendofaline.
JLex
![Page 103: 03-60-214 Lexical analysis S a, b B ajlu.myweb.cs.uwindsor.ca/214/214Lexer2019.pdf · 7 Regular expression • Regular expression: construc5ng sequences of symbols (strings) from](https://reader033.vdocuments.mx/reader033/viewer/2022042019/5e76f23dbbb90615203bb3e8/html5/thumbnails/103.jpg)
103
Concludingremarks• FocusedonLexicalAnalysisProcess,Including
– RegularExpressions– FiniteAutomaton– Conversion– Lex
• Regulargrammar=regularexpression• RegularexpressionàNFAàDFAàlexer• Thenextstepinthecompila5onprocessisParsing:
– Contextfreegrammar;– Top-downparsingandbo[omupparsing.
JLex