compiler construction dr. naveed ejaz lecture 5. lexical analysis

Post on 05-Jan-2016

218 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Compiler Compiler ConstructionConstruction

Compiler Compiler ConstructionConstruction

Dr. Naveed Ejaz

Lecture 5

Lexical AnalysisLexical AnalysisLexical AnalysisLexical Analysis

3

Recall: Front-EndRecall: Front-EndRecall: Front-EndRecall: Front-End

Output of lexical analysis is a stream of tokens

scanner parsersourcecode

tokens IR

errors

4

TokensTokensTokensTokensExample:

if( i == j )

z = 0;

else

z = 1;

5

TokensTokensTokensTokens Input is just a sequence of

characters:

if ( \b i \b = = \b j \n \t ....

6

TokensTokensTokensTokens

Goal: partition input string into

substrings classify them according to

their role

7

TokensTokensTokensTokens A token is a syntactic

category

Natural language: “He wrote the program”

Words: “He”, “wrote”, “the”, “program”

8

TokensTokensTokensTokens Programming language:

“if(b == 0) a = b” Words:

“if”, “(”, “b”, “==”, “0”, “)”, “a”, “=”, “b”

9

TokensTokensTokensTokens Identifiers: x y11 maxsize Keywords: if else while for Integers: 2 1000 -44 5L Floats: 2.0 0.0034 1e5 Symbols: ( ) + * / { } < > == Strings: “enter x” “error”

10

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Hand-write code to generate

tokens. Partition the input string by

reading left-to-right, recognizing one token at a time

11

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Look-ahead required to

decide where one token ends and the next token begins.

12

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

13

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

14

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

15

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

16

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerclass Lexer { Inputstream s; char next;//look ahead Lexer(Inputstream _s) { s = _s;

next = s.read(); }

17

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...

18

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...

19

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...

20

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken nextToken() { if( idChar(next) ) return readId(); if( number(next) ) return readNumber(); if( next == ‘”’ ) return readString(); ... ...

21

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

22

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

23

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

24

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

25

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

26

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

27

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readId() { string id = “”; while(true){ char c = input.read(); if(idChar(c) == false) return new Token(TID,id);

id = id + string(c); }}

28

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexerboolean idChar(char c){if( isAlpha(c) ) return true;if( isDigit(c) ) return true;if( c == ‘_’ ) return true;

return false;}

29

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);

num = num+string(next); }}

30

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);

num = num+string(next); }}

31

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerToken readNumber(){ string num = “”; while(true){ next = input.read(); if( !isNumber(next)) return new Token(TNUM,num);

num = num+string(next); }}

32

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerProblems: Do not know what kind of

token we are going to read from seeing first character.

33

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc LexerProblems: If token begins with “i”, is it

an identifier “i” or keyword “if”?

If token begins with “=”, is it “=” or “==”?

34

Ad-hoc LexerAd-hoc LexerAd-hoc LexerAd-hoc Lexer Need a more principled

approach Use lexer generator that

generates efficient tokenizer automatically.

top related