0368-3133 lecture 2: lexical analysis noam...

112
Compilation 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzky 1

Upload: haxuyen

Post on 15-Feb-2019

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Compilation0368-3133

Lecture2:

LexicalAnalysis

NoamRinetzky1

Page 2: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

2

Page 3: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

LexicalAnalysis

ModernCompilerDesign:Chapter2.1

3

Page 4: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

ConceptualStructureofaCompiler

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend

Compiler

Frontend

LexicalAnalysis

Syntax AnalysisParsing

Semantic Analysis

IntermediateRepresentation

(IR)

CodeGeneration

4

Page 5: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

ConceptualStructureofaCompiler

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend

Compiler

Frontend

LexicalAnalysis

Syntax AnalysisParsing

Semantic Analysis

IntermediateRepresentation

(IR)

CodeGeneration

words sentences 5

Page 6: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

WhatdoesLexicalAnalysisdo?

• Language:fullyparenthesizedexpressionsExpr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’

( ( 23 + 7 ) * 19 )

6

Page 7: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

WhatdoesLexicalAnalysisdo?

• Language:fullyparenthesizedexpressionsContextfreelanguage

Regularlanguages

( ( 23 + 7 ) * 19 )

Expr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’

7

Page 8: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

WhatdoesLexicalAnalysisdo?

• Language:fullyparenthesizedexpressionsContextfreelanguage

Regularlanguages

( ( 23 + 7 ) * 19 )

Expr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’

8

Page 9: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

WhatdoesLexicalAnalysisdo?

• Language:fullyparenthesizedexpressionsContextfreelanguage

Regularlanguages

( ( 23 + 7 ) * 19 )

Expr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’

9

Page 10: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

WhatdoesLexicalAnalysisdo?

• Language:fullyparenthesizedexpressionsContextfreelanguage

Regularlanguages

( ( 23 + 7 ) * 19 )

LP LP Num Op Num RP Op Num RP

Expr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’

10

Page 11: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

WhatdoesLexicalAnalysisdo?

• Language:fullyparenthesizedexpressionsContextfreelanguage

Regularlanguages

( ( 23 + 7 ) * 19 )

LP LP Num Op Num RP Op Num RPKind

Value

Expr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’

11

Page 12: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

WhatdoesLexicalAnalysisdo?

• Language:fullyparenthesizedexpressionsContextfreelanguage

Regularlanguages

( ( 23 + 7 ) * 19 )

LP LP Num Op Num RP Op Num RPKind

Value

Expr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’Token Token …

12

Page 13: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

• Partitionstheinputintostreamoftokens– Numbers– Identifiers– Keywords– Punctuation

• Usuallyrepresentedas(kind,value)pairs– (Num,23)– (Op,‘*’)

• “word”inthesourcelanguage• “meaningful”tothesyntacticalanalysis

WhatdoesLexicalAnalysisdo?

13

Page 14: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Fromscanningtoparsing((23 + 7) * x)

) ?*)7+23((RPIdOPRPNumOPNumLPLP

LexicalAnalyzer

programtext

tokenstream

ParserGrammar:Expr® ...|IdId® ‘a’|...|‘z’

Op(*)

Id(?)

Num(23) Num(7)

Op(+)

AbstractSyntaxTree

validsyntaxerror

14

Page 15: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

WhyLexicalAnalysis?

• Well,notstrictlynecessary,but …– RegularlanguagesÍ Context-Freelanguages

• Simplifiesthesyntaxanalysis(parsing)– Andlanguagedefinition

• Modularity• Reusability• Efficiency

15

Page 16: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Lecturegoals

• Understandrole&placeoflexicalanalysis

• Lexicalanalysistheory• Usingprogramgeneratingtools

16

Page 17: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

LectureOutline

üRole&placeoflexicalanalysis• Whatisatoken?• Regularlanguages• Lexicalanalysis• Errorhandling• Automaticcreationoflexicalanalyzers

17

Page 18: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Whatisatoken?(Intuitively)

• A“word”inthesourcelanguage– Anythingthatshouldappearintheinputtosyntaxanalysis• Identifiers• Values• Languagekeywords

• Usually,representedasapairof(kind,value)

18

Page 19: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

ExampleTokens

Type Examples

ID foo, n_14, lastNUM 73, 00, 517, 082 REAL 66.1, .5, 5.5e-10IF ifCOMMA ,NOTEQ !=LPAREN (RPAREN )

19

Page 20: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

ExampleNonTokens

Type Examplescomment /* ignored */preprocessordirective #include <foo.h>

#define NUMS 5.6macro NUMSwhitespace \t, \n, \b, ‘ ‘

20

Page 21: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Somebasicterminology

• Lexeme(akasymbol)- aseriesoflettersseparatedfromtherestoftheprogramaccordingtoaconvention(space,semi-column,comma,etc.)

• Pattern - arulespecifyingasetofstrings.Example:“anidentifierisastringthatstartswithaletterandcontinueswithlettersanddigits”– (Usually)aregularexpression

• Token - apairof(pattern,attributes)

21

Page 22: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Examplevoid match0(char *s) /* find a zero */

{

if (!strncmp(s, “0.0”, 3))

return 0.0 ;

}

VOID ID(match0) LPAREN CHAR DEREF ID(s) RPAREN

LBRACE

IF LPAREN NOT ID(strncmp) LPAREN ID(s) COMMA STRING(0.0) COMMA NUM(3) RPAREN RPAREN

RETURN REAL(0.0) SEMI

RBRACE

EOF 22

Page 23: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

ExampleNonTokens

Type Examplescomment /* ignored */preprocessordirective #include <foo.h>

#define NUMS 5.6macro NUMSwhitespace \t, \n, \b, ‘ ‘

• Lexemesthatarerecognizedbutgetconsumedratherthantransmittedtoparser– if– i/*comment*/f

23

Page 24: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

LectureOutline

üRole&placeoflexicalanalysisüWhatisatoken?• Regularlanguages• Lexicalanalysis• Errorhandling• Automaticcreationoflexicalanalyzers

24

Page 25: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Howcanwedefinetokens?

• Keywords– easy!– if,then,else,for,while,…

• Identifiers?• NumericalValues?• Strings?

• Characterizeunboundedsetsofvaluesusingaboundeddescription?

25

Page 26: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Regularlanguages

• Formallanguages– Σ =finitesetofletters– Word=sequenceofletter– Language=setofwords

• Regularlanguagesdefinedequivalentlyby– Regularexpressions– Finite-stateautomata

26

Page 27: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Commonformatforreg-expsBasic Patterns Matching

x Thecharacterx

. Anycharacter,usuallyexceptanewline

[xyz] Anyofthecharactersx,y,z

^x Anycharacterexceptx

RepetitionOperators

R? AnRornothing(=optionallyanR)

R* Zero ormoreoccurrencesofR

R+ OneormoreoccurrencesofR

CompositionOperators

R1R2 AnR1 followedbyR2

R1|R2 Either anR1orR2

Grouping

(R) Ritself 27

Page 28: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Examples

• ab*|cd?=• (a|b)*=• (0|1|2|3|4|5|6|7|8|9)*=

28

Page 29: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Escapecharacters

• Whatistheexpressionforoneormore+symbols?– (+)+ won’twork– (\+)+ will

• backslash\ beforeanoperatorturnsittostandard character– \*, \?, \+, a\(b\+\*, (a\(b\+\*)+, …

• backslashdoublequotessurroundstext– “a(b+*”, “a(b+*”+ 29

Page 30: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Shorthands

• Usenamesforexpressions– letter=a|b|…|z|A|B|…|Z– letter_=letter|_– digit=0|1|2|…|9– id=letter_(letter_|digit)*

• Usehyphentodenotearange– letter=a-z|A-Z– digit=0-9

30

Page 31: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Examples

• if=if• then=then• relop =<|>|<=|>=|=|<>

• digit=0-9• digits=digit+

31

Page 32: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Example

• A number is number = ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 )+

( e | \. ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 )+

( e | E ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 )+)

)

• Using shorthands it can be written as

number = digits (e | \.digits (e | E (e|+|-) digits ) )

32

Page 33: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Exercise1- Question

• Languageofrationalnumbersindecimalrepresentation(noleading,endingzeros)– 0– 123.757– .933333– Not007– Not0.30

33

Page 34: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Exercise1- Answer

• Languageofrationalnumbersindecimalrepresentation(noleading,endingzeros)

– Digit =1|2|…|9Digit0=0|DigitNum =DigitDigit0*Frac =Digit0*DigitPos =Num|\.Frac |0\.Frac|Num\.FracPosOrNeg =(Є|-)PosR =0|PosOrNeg

34

Page 35: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Exercise2- Question

• Equalnumberofopeningandclosingparenthesis:[n]n =[],[[]],[[[]]],…

35

Page 36: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Exercise2- Answer

• Equalnumberofopeningandclosingparenthesis:[n]n =[],[[]],[[[]]],…

• Notregular• Context-free• Grammar: S::=[] |[S]

36

Page 37: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Challenge:Ambiguity

• If=if• Id=Letter(Letter|Digit)*

• “if”isavalididentifiers…whatshoulditbe?• ‘’iffy”isalsoavalididentifier

• Solution– Longestmatchingtoken– Breaktiesusingorderofdefinitions…

• Keywordsshouldappearbeforeidentifiers37

Page 38: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Creatingalexicalanalyzer

• Givenalistoftokendefinitions(patternname,regex),writeaprogramsuchthat– Input:Stringtobeanalyzed– Output:Listoftokens

• Howdowebuildananalyzer?

38

Page 39: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

BuildingaScanner– TakeI

• Input:String

• Output:Sequenceoftokens

39

Page 40: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

BuildingaScanner– TakeIToken nextToken(){char c ;loop: c = getchar();switch (c){case ` `: goto loop ;case `;`: return SemiColumn;case `+`: c = getchar() ;switch (c) {case `+': return PlusPlus ;case '=’ return PlusEqual;default: ungetc(c); return Plus;

};case `<`: …case `w`: …

} 40

Page 41: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Theremustbeabetterway!

41

Page 42: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Abetterway

• Automatically generate ascanner

• Definetokensusingregularexpressions

• Usefinite-stateautomatafordetection

42

Page 43: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Reg-expvs.automata

• Regularexpressionsaredeclarative– Goodfor humans– Not“executable”

• Automata areoperative– Defineanalgorithm fordecidingwhetheragivenwordisinaregularlanguage

– Notanaturalnotationforhumans43

Page 44: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Overview

• Definetokensusingregularexpressions

• Constructanondeterministicfinite-stateautomaton(NFA)fromregularexpression

• Determinize theNFAintoadeterministicfinite-stateautomaton(DFA)

• DFAcanbedirectlyusedtoidentifytokens44

Page 45: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Automatatheory:abird’s-eyeview

45

Page 46: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

DeterministicAutomata(DFA)

• M=(S,Q,d,q0,F)– S - alphabet– Q– finitesetofstate– q0Î Q– initialstate– FÍ Q– finalstates– δ:Q´ Sà Q - transitionfunction

• Forawordw,Mreachsomestatex– MacceptswifxÎ F

46

Page 47: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

DFA inpictures

start

a

b,c

a,b

c

acceptingstate

startstate

transition

• Anautomatonisdefinedbystatesandtransitions

47

a,b,c a,b,c

Page 48: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

AcceptingWords

• Wordsarereadleft-to-rightcba

start

a

b

c

48

• Missingtransition=non-acceptance– “Stuckstate”

Page 49: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

• Wordsarereadleft-to-right

AcceptingWords

cba

start

a

b

c

49

Page 50: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

• Wordsarereadleft-to-right

AcceptingWords

cba

start

a

b

c

50

Page 51: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

• Wordsarereadleft-to-right

AcceptingWords

cba

start

a

b

c

51

Page 52: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

RejectingWords

cbb

start

a

b

c

52

• Wordsarereadleft-to-right

Page 53: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

start

RejectingWords

• Missingtransitionmeansnon-acceptancecbb

a

b

c

53

Page 54: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Non-deterministicAutomata(NFA)

• M=(S,Q,d,q0,F)– S - alphabet– Q– finitesetofstate– q0 ÎQ – initialstate

– FÍ Q– finalstates– δ:Q´ (S È {e})→2Q - transitionfunction

• DFA:δ:Q´ Sà Q

• Forawordw,McanreachanumberofstatesX– MacceptswifX∩M≠{}

• Possible:X={}

• Possiblee-transitions 54

Page 55: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

NFA

• Allowmultipletransitionsfromgivenstatelabeledbysameletter

start

a

a

b

c

c

b

55

Page 56: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Acceptingwords

cba

start

a

a

b

c

c

b

56

Page 57: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Acceptingwords

• Maintainsetofstates

cba

start

a

a

b

c

c

b

57

Page 58: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Acceptingwords

cba

start

a

a

b

c

c

b

58

Page 59: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Acceptingwords• Acceptwordifreachedanacceptingstate

cba

start

a

a

b

c

c

b

59

Page 60: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

NFA+Є automata

• Є transitionscan“fire”withoutreadingtheinput

Є

start a

b

c

60

Page 61: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

NFA+Є runexample

cba

Є

start a

b

c

61

Page 62: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

NFA+Є runexample• NowЄ transitioncannon-deterministicallytakeplace

cba

Є

start a

b

c

62

Page 63: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

NFA+Є runexample

cba

Є

start a

b

c

63

Page 64: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

NFA+Є runexample

cba

Є

start a

b

c

64

Page 65: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

NFA+Є runexample

cba

Є

start a

b

c

65

• Є transitionscan“fire”withoutreadingtheinput

Page 66: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

NFA+Є runexample

cba

• Wordaccepted

Є

start a

b

c

66

Page 67: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

FromregularexpressionstoNFA

• Step1:assignexpressionnamesandobtainpureregularexpressionsR1…Rm

• Step2:constructanNFAMi foreachregularexpressionRi

• Step3:combineallMi intoasingleNFA

• Ambiguityresolution:preferlongestacceptingword 67

Page 68: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Fromreg.exp.toautomata• Theorem:thereisanalgorithmtobuildanNFA+Є automatonforanyregularexpression

• Proof:byinductiononthestructureoftheregularexpression

start

68

Page 69: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

R = e

R = f

R = aa

Basicconstructs

69

Page 70: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

CompositionR = R1 | R2 e M1

M2e

e

e

R = R1R2

eM1 M2

e e

70

Page 71: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Repetition

R = R1*

eM1

e

e

e

71

Page 72: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

72

Page 73: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Naïveapproach

• Tryeachautomatonseparately

• Givenawordw:– TryM1(w)– TryM2(w)– …– TryMn(w)

• Requiresresettingaftereveryattempt73

Page 74: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Actually,wecombineautomata

1 2aa

3a

4 b 5 b 6

abb

7 8b a*b+ba

9a

10 b 11 a 12 b 13

abab

0

e

e

e

e

aabba*b+abab

combines

74

Page 75: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

CorrespondingDFA

01379

8

7

b

a

a24710

a

bb

68

5811b

12 13a b

b

abba*b+a*b+

a*b+

abab

a

Combine automata: an example.

Combine a, abb, a*b+, abab.

75#

1# 2#a#

a#

3#a#

4#b#

5#b#

6#

abb#

7# 8#b#

a*b+#b#a#

9#a#

10#b#

11#a#

12#b#

13#

abab#

0#

ε#

ε#

ε#

ε#

b

75

Page 76: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

ScanningwithDFA

• Rununtilstuck– Rememberlastacceptingstate

• Gobacktoacceptingstate• Returntoken

76

Page 77: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Ambiguityresolution

• Longestword• Tie-breakerbasedonorderofrules whenwordshavesamelength

77

Combine automata: an example.

Combine a, abb, a*b+, abab.

75#

1# 2#a#

a#

3#a#

4#b#

5#b#

6#

abb#

7# 8#b#

a*b+#b#a#

9#a#

10#b#

11#a#

12#b#

13#

abab#

0#

ε#

ε#

ε#

ε#

Page 78: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Examples

01379

8

7

b

a

a24710

a

bb

68

5811b

12 13a b

b

abba*b+a*b+

a*b+

abab

a

Combine automata: an example.

Combine a, abb, a*b+, abab.

75#

1# 2#a#

a#

3#a#

4#b#

5#b#

6#

abb#

7# 8#b#

a*b+#b#a#

9#a#

10#b#

11#a#

12#b#

13#

abab#

0#

ε#

ε#

ε#

ε#b

abaa:getsstuckafterabainstate12,backsuptostate(5811)patternisa*b+,tokenisabTokens:<a*b+,ab><a,a><a,a> 78

Page 79: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Examples

01379

8

7

b

a

a24710

a

bb

68

5811b

12 13a b

b

abba*b+a*b+

a*b+

abab

a

b

abba:stopsaftersecondbin(68),tokenisabb becauseitcomesfirstinspec79Tokens:<abb,abb><a,a>

Combine automata: an example.

Combine a, abb, a*b+, abab.

75#

1# 2#a#

a#

3#a#

4#b#

5#b#

6#

abb#

7# 8#b#

a*b+#b#a#

9#a#

10#b#

11#a#

12#b#

13#

abab#

0#

ε#

ε#

ε#

ε#

Page 80: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

SummaryofConstruction

• Describetokensasregularexpressions– Decideattributes(values)tosaveforeachtoken

• RegularexpressionsturnedintoaDFA– Also,recordswhichattributes(values)tokeep

• Lexicalanalyzersimulatestherunofanautomatawiththegiventransitiontableonanyinputstring

80

Page 81: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

AFewRemarks

• TurninganNFAtoaDFAisexpensive,but– Exponentialintheworstcase– Inpractice,worksfine

• Theconstructionisdoneonceper-language– AtCompilerconstructiontime– Not atcompilationtime

81

Page 82: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Implementation

82

Page 83: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

ImplementationbyExampleif { return IF; }[a-z][a-z0-9]* { return ID; }[0-9]+ { return NUM; }[0-9]”.”[0-9]+|[0-9]*”.”[0-9]+ { return REAL; }(\-\-[a-z]*\n)|(“ “|\n|\t) { ; }. { error(); }

83

if

xy,i,zs98

3,32,032

0.55,33.1

--comm\n\n, \t,““ ID

IF

ID error REAL

NUM REAL

error w.s.errorw.s.

01

2 3

9 10 1112

Page 84: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

int edges[][256]= { /* …, 0, 1, 2, 3, ..., -, e, f, g, h, i, j, ... */

/* state 0 */ {0, …, 0, 0, …, 0, 0, 0, 0, 0, ..., 0, 0, 0, 0, 0, 0},/* state 1 */ {13, … , 7, 7, 7, 7, …, 9, 4, 4, 4, 4, 2, 4, …, 13, 13},/* state 2 */ {0, …, 4, 4, 4, 4, ..., 0, 4, 3, 4, 4, 4, 4, …, 0, 0},/* state 3 */ {0, …, 4, 4, 4, 4, …, 0, 4, 4, 4, 4, 4, 4, , 0, 0},/* state 4 */ {0, …, 4, 4, 4, 4, …, 0, 4, 4, 4, 4, 4, 4, …, 0, 0}, /* state 5 */ {0, …, 6, 6, 6, 6, …, 0, 0, 0, 0, 0, 0, 0, …, 0, 0},/* state 6 */ {0, …, 6, 6, 6, 6, …, 0, 0, 0, 0, 0, 0, 0, ..., 0, 0},/* state 7 *//* state … */ .../* state 13 */ {0, …, 0, 0, 0, 0, …, 0, 0, 0, 0, 0, 0, 0, …, 0, 0}}; 84

ID

IF

ID error REAL

NUM REAL

error w.s.errorw.s.

01

2 3

9 10 1112

Page 85: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

PseudoCodeforScannerchar* input = … ;

Token nextToken() {lastFinal = 0; currentState = 1 ;inputPositionAtLastFinal = input; currentPosition = input; while (not(isDead(currentState))) {

nextState = edges[currentState][*currentPosition];if (isFinal(nextState)) {

lastFinal = nextState ; inputPositionAtLastFinal = currentPosition;

}currentState = nextState; advance currentPosition;

}input = inputPositionAtLastFinal + 1;return action[lastFinal];

}85

Page 86: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Example

Input:“if--not-a-com”

86

2blanks

ID

IF

ID error REAL

NUM REAL

error w.s.errorw.s.

01

2 3

9 10 1112

Page 87: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

final state input

0 1 if--not-a-com

2 2 if--not-a-com

3 3 if--not-a-com

3 0 if--not-a-comreturnIF

87

ID

IF

ID error REAL

NUM REAL

error w.s.errorw.s.

01

2 3

9 10 1112

Page 88: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

foundwhitespace

final state input

0 1 --not-a-com

12 12 --not-a-com

12 12 --not-a-com

12 0 --not-a-com

88

Page 89: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

final state input

0 1 --not-a-com

9 9 --not-a-com

9 10 --not-a-com

10 10 --not-a-com

10 10 --not-a-com

10 0 --not-a-comerror

89

ID

IF

ID error REAL

NUM REAL

error w.s.errorw.s.

01

2 3

9 10 1112

Page 90: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

final state input

0 1 -not-a-com

9 9 -not-a-com

9 0 -not-a-com

9 0 -not-a-com

9 0 -not-a-com

error

90

ID

IF

ID error REAL

NUM REAL

error w.s.errorw.s.

01

2 3

9 10 1112

Page 91: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Concludingremarks

• Efficientscanner• Minimization• Errorhandling• Automaticcreationoflexicalanalyzers

91

Page 92: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

EfficientScanners

• Efficientstaterepresentation• Inputbuffering• Usingswitchandgotosinsteadoftables

92

Page 93: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Minimization

• Createanon-deterministicautomaton(NDFA)fromeveryregularexpression

• Mergealltheautomatausingepsilonmoves(likethe|construction)

• Constructadeterministicfiniteautomaton(DFA)– Statepriority

• Minimizetheautomaton– separateacceptingstatesbytokenkinds

93

Page 94: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Exampleif { return IF; }[a-z][a-z0-9]* { return ID; }[0-9]+ { return NUM; }

94ModerncompilerimplementationinML,AndrewAppel,(c)1998,Figures2.7,2.8

IDIF

errorNUM

Page 95: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Exampleif { return IF; }[a-z][a-z0-9]* { return ID; }[0-9]+ { return NUM; }

95ModerncompilerimplementationinML,AndrewAppel,(c)1998,Figures2.7,2.8

IDIF

error

NUM

ID

NUM

ID

IDIF

errorNUM

Page 96: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Example

96

IDIF

errorNUM

IDIF

error

NUM

ID

NUM

ID

if { return IF; }[a-z][a-z0-9]* { return ID; }[0-9]+ { return NUM; }

IDIF

errorNUM

ID

ID

IF

NUM NUM

error

ModerncompilerimplementationinML,AndrewAppel,(c)1998,Figures2.7,2.8

Page 97: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Example

97

if { return IF; }[a-z][a-z0-9]* { return ID; }[0-9]+ { return NUM; }

IDIF

errorNUM

ID

ID

ID

IF

NUM NUM

error

ModerncompilerimplementationinML,AndrewAppel,(c)1998,Figures2.7,2.8

Page 98: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

ErrorHandling• Manyerrorscannotbeidentifiedatthisstage• Example:“fi(a==f(x))”.Should“fi”be“if”?Orisitaroutinename?

– Wewilldiscoverthislaterintheanalysis– Atthispoint,wejustcreateanidentifiertoken

• Sometimesthelexemedoesnotmatchanypattern– Easiest:eliminatelettersuntilthebeginningofalegitimatelexeme– Alternatives:eliminate/add/replaceoneletter,replaceorderoftwoadjacent

letters,etc.

• Goal:allowthecompilationtocontinue• Problem:errorsthatspreadallover

98

Page 99: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Automaticallygeneratedscanners

• UseofProgram-GeneratingTools– Specificationè Partofcompiler– Compiler-Compiler

Streamoftokens

JFlexregularexpressions

inputprogram scanner99

Page 100: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

UseofProgram-GeneratingTools

• Input:regularexpressionsandactions• Action=Javacode

• Output:ascannerprogramthat• Producesastreamoftokens• Invokeactionswhenpatternismatched

Streamoftokens

JFlexregularexpressions

inputprogram scanner100

Page 101: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

LineCountingExample

• Createaprogramthatcountsthenumberoflinesinagiveninputtextfile

101

Page 102: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

CreatingaScannerusingFlex

int num_lines = 0;%%\n ++num_lines;. ;%%main() {yylex();printf( "# of lines = %d\n", num_lines);

}

102

Page 103: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

CreatingaScannerusingFlex

initial

other

newline\n

^\n

int num_lines = 0;%%\n ++num_lines;. ;%%main() {yylex();printf( "# of lines = %d\n", num_lines);

}

103

Page 104: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

JFLex SpecFileUsercode:CopieddirectlytoJavafile

%%JFlex directives:macros,statenames

%%Lexicalanalysisrules:– Optionalstate,regularexpression,action– Howtobreakinputtotokens– Actionwhentokenmatched

Possiblesourceofjavac errorsdown

theroad

DIGIT=[0-9]LETTER=[a-zA-Z]

YYINITIAL

{LETTER}({LETTER}|{DIGIT})*

104

Page 105: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

CreatingaScannerusingJFlex

import java_cup.runtime.*;%%%cup%{private int lineCounter = 0;

%}

%eofval{System.out.println("line number=" + lineCounter);return new Symbol(sym.EOF);

%eofval}

NEWLINE=\n%%{NEWLINE} { lineCounter++; } [^{NEWLINE}] { }

105

Page 106: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Catchingerrors

• Whatifinputdoesn’tmatchanytokendefinition?

• Trick:Adda“catch-all”rulethatmatchesanycharacterandreportsanerror– Addafterallotherrules

106

Page 107: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

AJFlex specificationofCScannerimport java_cup.runtime.*;%%%cup%{private int lineCounter = 0;

%}Letter= [a-zA-Z_]Digit= [0-9]%%”\t” { }”\n” { lineCounter++; }“;” { return new Symbol(sym.SemiColumn);}“++” { return new Symbol(sym.PlusPlus); }“+=” { return new Symbol(sym.PlusEq); }“+” { return new Symbol(sym.Plus); }“while” { return new Symbol(sym.While); }{Letter}({Letter}|{Digit})*

{ return new Symbol(sym.Id, yytext() ); }“<=” { return new Symbol(sym.LessOrEqual); }“<” { return new Symbol(sym.LessThan); }

107

Page 108: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

Missing

• Creatingalexicalanalysisbyhand• Tablecompression• SymbolTables• NestedComments• HandlingMacros

108

Page 109: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

LexicalAnalysis:What

• Input:programtext(file)• Output:sequenceoftokens

109

Page 110: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

LexicalAnalysis:How

• Definetokensusingregularexpressions

• Constructanondeterministicfinite-stateautomaton(NFA)fromregularexpression

• Determinize theNFAintoadeterministicfinite-stateautomaton(DFA)

• DFAcanbedirectlyusedtoidentifytokens110

Page 111: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

LexicalAnalysis:Why

• Readinputfile• Identifylanguagekeywordsandstandardidentifiers• Handleincludefilesandmacros• Countlinenumbers• Removewhitespaces• Reportillegalsymbols

• [Producesymboltable]

111

Page 112: 0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic

TheRealAnatomyofaCompiler

Executable code

exe

Sourcetext

txtLexicalAnalysis

Sem.Analysis

Process text input

characters SyntaxAnalysistokens AST

Intermediate code

generation

Annotated AST

Intermediate code

optimizationIR Code

generationIR

Target code optimization

Symbolic Instructions

SI Machine code generation

Write executable

output

MI

112

LexicalAnalysis

SyntaxAnalysis