compiler construction dr. naveed ejaz lecture 5. lexical analysis

34
Compiler Compiler Construction Construction Dr. Naveed Ejaz Lecture 5

Upload: leslie-evans

Post on 05-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

Compiler Compiler ConstructionConstruction

Compiler Compiler ConstructionConstruction

Dr. Naveed Ejaz

Lecture 5

Page 2: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

Lexical AnalysisLexical AnalysisLexical AnalysisLexical Analysis

Page 3: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

3

Recall: Front-EndRecall: Front-EndRecall: Front-EndRecall: Front-End

Output of lexical analysis is a stream of tokens

scanner parsersourcecode

tokens IR

errors

Page 4: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

4

TokensTokensTokensTokensExample:

if( i == j )

z = 0;

else

z = 1;

Page 5: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

5

TokensTokensTokensTokens Input is just a sequence of

characters:

if ( \b i \b = = \b j \n \t ....

Page 6: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

6

TokensTokensTokensTokens

Goal: partition input string into

substrings classify them according to

their role

Page 7: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

7

TokensTokensTokensTokens A token is a syntactic

category

Natural language: “He wrote the program”

Words: “He”, “wrote”, “the”, “program”

Page 8: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

8

TokensTokensTokensTokens Programming language:

“if(b == 0) a = b” Words:

“if”, “(”, “b”, “==”, “0”, “)”, “a”, “=”, “b”

Page 9: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

9

TokensTokensTokensTokens Identifiers: x y11 maxsize Keywords: if else while for Integers: 2 1000 -44 5L Floats: 2.0 0.0034 1e5 Symbols: ( ) + * / { } < > == Strings: “enter x” “error”

Page 10: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

10

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Hand-write code to generate

tokens. Partition the input string by

reading left-to-right, recognizing one token at a time

Page 11: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

11

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Look-ahead required to

decide where one token ends and the next token begins.

Page 12: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

12

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

Page 13: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

13

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

Page 14: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

14

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

Page 15: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

15

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

Page 16: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

16

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

Page 17: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

17

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...

Page 18: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

18

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...

Page 19: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

19

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...

Page 20: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

20

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...

Page 21: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

21

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 22: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

22

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 23: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

23

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 24: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

24

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 25: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

25

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 26: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

26

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 27: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

27

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

Page 28: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

28

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerboolean idChar(char c){if( isAlpha(c) ) return true;if( isDigit(c) ) return true;if( c == ‘_’ ) return true;

return false;}

Page 29: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

29

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);

num = num+string(next); }}

Page 30: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

30

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);

num = num+string(next); }}

Page 31: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

31

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);

num = num+string(next); }}

Page 32: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

32

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerProblems: Do not know what kind of

token we are going to read from seeing first character.

Page 33: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

33

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerProblems: If token begins with “i”, is it

an identifier “i” or keyword “if”?

If token begins with “=”, is it “=” or “==”?

Page 34: Compiler Construction Dr. Naveed Ejaz Lecture 5. Lexical Analysis

34

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Need a more principled

approach Use lexer generator that

generates efficient tokenizer automatically.