compiler notes

Download Compiler Notes

Post on 13-Nov-2014

16 views

Category:

Documents

5 download

Embed Size (px)

TRANSCRIPT

COMPILERSBASIC COMPILER FUNCTIONS A compiler accepts a program written in a high level language as input and produces its machine language equivalent as output. For the purpose of compiler construction, a high level programming language is described in terms of a grammar. This grammar specifies the formal description of the syntax or legal statements in the language. Example: Assignment statement in Pascal is defined as: < variable > : = < Expression > The compiler has to match statement written by the programmer to the structure defined by the grammars and generates appropriate object code for each statement. The compilation process is so complex that it is not reasonable to implement it in one single step. It is partitioned into a series of sub-process called phases. A phase is a logically cohesive operation that takes as input one representation of the source program and produces an output of another representation. The basic phases are - Lexical Analysis, Syntax Analysis, and Code Generation. Lexical Analysis: It is the first phase. It is also called scanner. It separates characters of the source language into groups that logically belong together. These groups are called tokens. The usual tokens are: Keyword: Identifiers: Operator symbols: Punctuation symbols: such as DO or IF, such as x or num, such as : : = READ . 4. A designation of one of the non-terminals as the start symbol. This rule offers two possibilities separated by the symbol, for the syntax of an < id - list > may consist simply of a token id (the notation id denotes an identifier that is recognized by the scanner). The second syntax. Example: ALPHA ALPHA, BETA

ALPHA is an < id - list > that consist of another < id - list > ALPHA, followed by a comma, followed by an id BETA. Tree: It is also called parse tree or syntax tree. It is convenient to display the analysis of a source statement in terms of a grammar as a tree. Example: READ (VALUE) GRAMMAR: (read) : : = READ ( < id -list>) Example: Assignment statement: SUM : = 0 ; SUM : = + VALUE ; SUM : = - VALUE ;

187

System Software

Grammar: < assign > < exp > < term > < factor >

:: :: :: ::

= = = =

id : = < exp > < term > | < exp > - < term > < factor > | < term > * < factor > | < term > DIV < factor > id | int | ( < exp > )

Assign consists of an id followed by the token : = , followed by an expression Fig. 4(a). Show the syntax tree. Expressions are sequence of connected by the operations + and - Fig. 4(b). Show the syntax tree. Term is a sequence of < factor > S connected by * and DIV Fig. 4(c).A factor may consists of an identifies id or an int (which is also recognized by the scan) or an < exp > enclosed in parenthesis. Fig. 4(d). < assign > id := {variance } Fig. 4 (a) < exp > < term > + < exp >

Fig. 4 (b)

< term > | < factor > Dir X < term > < factor > id int Id Fig.4 (c)

factor |

(< exp > ) Fig. 4 Parse Trees

Fig. 4 (d)

For the statement Variance : = SUMSQ Div 100 - MEAN * MEAN ; The list of simplified Pascal grammar is shown in fig.5. 1. < prog >2. 3. 4.

5.6.

7. 8.

: : = PROGRAM < program > VAR BEGIN < stmt > - list > END. < prog - name >: : = id < dec - list > : : = < dec > | < dec - list > ; < dec > < dec > : : = < id - list > : < type > < type > : : = integer < id - list > : : = id | < id - list > , id : : = < stmt > ; < stmt > < stmt > : : = < assign > | | < write > | < for >

Compilers

188

9. 10. 11. 12.

< assign > < exp > < term > < factor >

::= ::= ::= ::= ::= ::= ::= ::= ::=

id : = < exp > < term > | < exp > + < term > | < exp > - < term > < factor > | < term > | DIV id ; int | (< exp >) READ ( < id - list >) WRITE ( < id - list >) FOR < idex - exp > Do < body > id : = < exp > To ( exp > < start > | BEGIN < start - list > END

13. < READ > 14. < write > 15. < for > 16. < index - exp> 17. < body >

Fig. 5 Simplified Pascal Grammar ( < prog >) |PROGRAM < prog - name > VAR dec - list Id {STATS} < dec > < stmt - list > < id - list > : < type > INTEGER < stmt - list > ; ; WRITE ; < stmt > BEGIN END

< write > ( ) id {VARIANCE}

(id - list)

,

id {VARIANCE}

(id - list ) ; (id - list ) , id

id (MEAN)

< stmt - list >

< assign > (id - list ) . id ;

< stmt - list >

id := {VARIANCE} < exp >

< assign > (id - list ) , , id {SUM} {SUMSQ} id {I} < stmt > ; < start > < assign > id : = {mean} | | * | | id | [MEAN] id {MEAN} int

id {SMSQ}

< assign > id := | | | | Div | | | term | >term> Div id : < exp > | factor factor < term > | int < factor > { 0} | int {0} | | | {SUM} Next Page int id

{100} | {100} id {SUMSQ}

189|

System Software

< for >FOR Do < body >

Id {I}

: = To BEGIN END | | < term > | | ; < stmt > | | | int int {I} {100} ; | | < stmy > | id := < read > (SUMSQ id : = {SUM} READ ( < id - list > ) < exp > + < term > < exp > + < term > id | | | {VALUE? < factor > < term > * | | | | | < factor >. id id | { value} | | {value} id id id {SUM} {SUMSQ} {value}

Fig. 6 Parse tree for the Program 1

Parse tree for Pascal program in fig.1 is shown in fig. 6 1 (a) Draw parse trees, according to the grammar in fig. 5 for the following S:(a) ALPHA < id - list > | id { ALPHA } < id - list > < id - list > , id < id - list > , {BETA} id [ ALPHA ] id {GAMMA}

(b) ALPHA, BETA, GAMMA

Compilers

190

2 (a) Draw Parse tree, according to the grammar in fig. 5 for the following < exp > S :(a) ALPHA + BETA < exp > | < term > < term > | < factor > + | id{ALPHA}

< factor > | id{BETA}

(b) ALPHA - BETA + GAMMA < exp < exp > < term > term

< term > * factor | | < factor > < factor > id | {GAMMA} id id{ALPHA} {BETA}

(c) ALPHA DIV (BETA + GAMMA) = DELTA < exp > < exp > < term > | < term > < factor > ( id {ALPHA} < exp > < term > id{BETA}

-

< term > < factor >

Div

< factor > < exp > + )

{DELTA}

< term > factor id{GAMMA}

191

System Software

3.

Suppose the rules of the grammar for < exp > and < term > is as follows: < exp > :: = < term > | < exp > * < term> | < exp> Div < term > < term > :: = | < term > + < factor > | < term > - < factor > Draw the parse trees for the following:(a) A1 + B1 (b) A1 - B1 * G1 (c) A1 + DIV (B1 + G1) - D1 < exp > | term < term > factor id {A1} (b) A1 - B1 * G1 + < factor > | id {B1} < exp > | teerm term id factor {A1} id {B1} (c) A1 DIV (B1 + A1) - D1 < exp > < exp > < term > < factor > | id {A1} ( DIV < factor > * factor | id {G1}

(a) A1 + B1

teerm factor

< term > < term > < factor > < factor > id {D1} )

< exp > < term >

< term >

+

< factor >

Compilers

192 < factor > id {B1} id {G1}

LEXICAL ANALYSIS Lexical Analysis involves scanning the program to be compiled. Scanners are designed to recognize keywords, operations, identifiers, integer, floating point numbers, character strings and other items that are written as part of the source program. Items are recognized directly as single tokens. These tokens could be defined as a part of the grammar. Example: : : = | | : : = A | B | C | . . . | Z : : = 0 | 1 | 2 | . . . | 9 In a such a case the scanner world recognize as tokens the single characters A, B, . . . Z,, 0, 1, . . . 9. The parser could interpret a sequence of such characters as the language construct < ident >. Scanners can perform this function more efficiently. There can be significant saving in compilation time since large part of the source program consists of multiple-character identifiers. It is also possible to restrict the length of identifiers in a scanner than in a passing notion. The scanner generally recognizes both single and multiple character tokens directly. The scanner output consists of sequence of tokens. This token can be considered to have a fixed length code. The fig. 7 gives a list of integer code for each token for the program in fig. 5 in such a type of coding scheme, the PROGRAM is represented by the integer value 1, VAR has the integer value 2 and so on.Token Code Token Token Token Code Token Code Program 1 READ := := 15 Id 22 VAR 2 WRITE + + 16 Int 23 Fig. 7 Token Coding Scheme BEGIN 3 To 17 END 4 Do K K 18 END 5 ; DIV DIV 17 INTEGER 6 : ( ( 20 FOR 7 , ) ) 21

For a keyword or an operator the token loading scheme gives sufficient information. In the case of an identifier, it is also necessary to supply particular identifier name that was scanned. It is true for the integer, floating point values, character-string constant etc. A token specifier can be associated with the type of code for such tokens. This

Recommended

View more >