Compiler Notes

Download Compiler Notes

Post on 13-Nov-2014




4 download

Embed Size (px)


<p>COMPILERSBASIC COMPILER FUNCTIONS A compiler accepts a program written in a high level language as input and produces its machine language equivalent as output. For the purpose of compiler construction, a high level programming language is described in terms of a grammar. This grammar specifies the formal description of the syntax or legal statements in the language. Example: Assignment statement in Pascal is defined as: &lt; variable &gt; : = &lt; Expression &gt; The compiler has to match statement written by the programmer to the structure defined by the grammars and generates appropriate object code for each statement. The compilation process is so complex that it is not reasonable to implement it in one single step. It is partitioned into a series of sub-process called phases. A phase is a logically cohesive operation that takes as input one representation of the source program and produces an output of another representation. The basic phases are - Lexical Analysis, Syntax Analysis, and Code Generation. Lexical Analysis: It is the first phase. It is also called scanner. It separates characters of the source language into groups that logically belong together. These groups are called tokens. The usual tokens are: Keyword: Identifiers: Operator symbols: Punctuation symbols: such as DO or IF, such as x or num, such as : : = READ . 4. A designation of one of the non-terminals as the start symbol. This rule offers two possibilities separated by the symbol, for the syntax of an &lt; id - list &gt; may consist simply of a token id (the notation id denotes an identifier that is recognized by the scanner). The second syntax. Example: ALPHA ALPHA, BETA</p> <p>ALPHA is an &lt; id - list &gt; that consist of another &lt; id - list &gt; ALPHA, followed by a comma, followed by an id BETA. Tree: It is also called parse tree or syntax tree. It is convenient to display the analysis of a source statement in terms of a grammar as a tree. Example: READ (VALUE) GRAMMAR: (read) : : = READ ( &lt; id -list&gt;) Example: Assignment statement: SUM : = 0 ; SUM : = + VALUE ; SUM : = - VALUE ;</p> <p>187</p> <p>System Software</p> <p>Grammar: &lt; assign &gt; &lt; exp &gt; &lt; term &gt; &lt; factor &gt;</p> <p>:: :: :: ::</p> <p>= = = =</p> <p>id : = &lt; exp &gt; &lt; term &gt; | &lt; exp &gt; - &lt; term &gt; &lt; factor &gt; | &lt; term &gt; * &lt; factor &gt; | &lt; term &gt; DIV &lt; factor &gt; id | int | ( &lt; exp &gt; )</p> <p>Assign consists of an id followed by the token : = , followed by an expression Fig. 4(a). Show the syntax tree. Expressions are sequence of connected by the operations + and - Fig. 4(b). Show the syntax tree. Term is a sequence of &lt; factor &gt; S connected by * and DIV Fig. 4(c).A factor may consists of an identifies id or an int (which is also recognized by the scan) or an &lt; exp &gt; enclosed in parenthesis. Fig. 4(d). &lt; assign &gt; id := {variance } Fig. 4 (a) &lt; exp &gt; &lt; term &gt; + &lt; exp &gt;</p> <p>Fig. 4 (b)</p> <p>&lt; term &gt; | &lt; factor &gt; Dir X &lt; term &gt; &lt; factor &gt; id int Id Fig.4 (c)</p> <p>factor |</p> <p>(&lt; exp &gt; ) Fig. 4 Parse Trees</p> <p>Fig. 4 (d)</p> <p>For the statement Variance : = SUMSQ Div 100 - MEAN * MEAN ; The list of simplified Pascal grammar is shown in fig.5. 1. &lt; prog &gt;2. 3. 4.</p> <p>5.6.</p> <p>7. 8.</p> <p>: : = PROGRAM &lt; program &gt; VAR BEGIN &lt; stmt &gt; - list &gt; END. &lt; prog - name &gt;: : = id &lt; dec - list &gt; : : = &lt; dec &gt; | &lt; dec - list &gt; ; &lt; dec &gt; &lt; dec &gt; : : = &lt; id - list &gt; : &lt; type &gt; &lt; type &gt; : : = integer &lt; id - list &gt; : : = id | &lt; id - list &gt; , id : : = &lt; stmt &gt; ; &lt; stmt &gt; &lt; stmt &gt; : : = &lt; assign &gt; | | &lt; write &gt; | &lt; for &gt;</p> <p>Compilers</p> <p>188</p> <p>9. 10. 11. 12.</p> <p>&lt; assign &gt; &lt; exp &gt; &lt; term &gt; &lt; factor &gt;</p> <p>::= ::= ::= ::= ::= ::= ::= ::= ::=</p> <p>id : = &lt; exp &gt; &lt; term &gt; | &lt; exp &gt; + &lt; term &gt; | &lt; exp &gt; - &lt; term &gt; &lt; factor &gt; | &lt; term &gt; | DIV id ; int | (&lt; exp &gt;) READ ( &lt; id - list &gt;) WRITE ( &lt; id - list &gt;) FOR &lt; idex - exp &gt; Do &lt; body &gt; id : = &lt; exp &gt; To ( exp &gt; &lt; start &gt; | BEGIN &lt; start - list &gt; END</p> <p>13. &lt; READ &gt; 14. &lt; write &gt; 15. &lt; for &gt; 16. &lt; index - exp&gt; 17. &lt; body &gt;</p> <p>Fig. 5 Simplified Pascal Grammar ( &lt; prog &gt;) |PROGRAM &lt; prog - name &gt; VAR dec - list Id {STATS} &lt; dec &gt; &lt; stmt - list &gt; &lt; id - list &gt; : &lt; type &gt; INTEGER &lt; stmt - list &gt; ; ; WRITE ; &lt; stmt &gt; BEGIN END</p> <p>&lt; write &gt; ( ) id {VARIANCE}</p> <p>(id - list)</p> <p>,</p> <p>id {VARIANCE}</p> <p>(id - list ) ; (id - list ) , id</p> <p>id (MEAN)</p> <p>&lt; stmt - list &gt;</p> <p> &lt; assign &gt; (id - list ) . id ; </p> <p> &lt; stmt - list &gt;</p> <p>id := {VARIANCE} &lt; exp &gt;</p> <p>&lt; assign &gt; (id - list ) , , id {SUM} {SUMSQ} id {I} &lt; stmt &gt; ; &lt; start &gt; &lt; assign &gt; id : = {mean} | | * | | id | [MEAN] id {MEAN} int</p> <p>id {SMSQ}</p> <p>&lt; assign &gt; id := | | | | Div | | | term | &gt;term&gt; Div id : &lt; exp &gt; | factor factor &lt; term &gt; | int &lt; factor &gt; { 0} | int {0} | | | {SUM} Next Page int id</p> <p>{100} | {100} id {SUMSQ}</p> <p>189|</p> <p>System Software</p> <p>&lt; for &gt;FOR Do &lt; body &gt;</p> <p>Id {I}</p> <p>: = To BEGIN END | | &lt; term &gt; | | ; &lt; stmt &gt; | | | int int {I} {100} ; | | &lt; stmy &gt; | id := &lt; read &gt; (SUMSQ id : = {SUM} READ ( &lt; id - list &gt; ) &lt; exp &gt; + &lt; term &gt; &lt; exp &gt; + &lt; term &gt; id | | | {VALUE? &lt; factor &gt; &lt; term &gt; * | | | | | &lt; factor &gt;. id id | { value} | | {value} id id id {SUM} {SUMSQ} {value}</p> <p>Fig. 6 Parse tree for the Program 1</p> <p>Parse tree for Pascal program in fig.1 is shown in fig. 6 1 (a) Draw parse trees, according to the grammar in fig. 5 for the following S:(a) ALPHA &lt; id - list &gt; | id { ALPHA } &lt; id - list &gt; &lt; id - list &gt; , id &lt; id - list &gt; , {BETA} id [ ALPHA ] id {GAMMA}</p> <p>(b) ALPHA, BETA, GAMMA</p> <p>Compilers</p> <p>190</p> <p>2 (a) Draw Parse tree, according to the grammar in fig. 5 for the following &lt; exp &gt; S :(a) ALPHA + BETA &lt; exp &gt; | &lt; term &gt; &lt; term &gt; | &lt; factor &gt; + | id{ALPHA}</p> <p>&lt; factor &gt; | id{BETA}</p> <p>(b) ALPHA - BETA + GAMMA &lt; exp &lt; exp &gt; &lt; term &gt; term</p> <p>&lt; term &gt; * factor | | &lt; factor &gt; &lt; factor &gt; id | {GAMMA} id id{ALPHA} {BETA}</p> <p>(c) ALPHA DIV (BETA + GAMMA) = DELTA &lt; exp &gt; &lt; exp &gt; &lt; term &gt; | &lt; term &gt; &lt; factor &gt; ( id {ALPHA} &lt; exp &gt; &lt; term &gt; id{BETA}</p> <p>-</p> <p>&lt; term &gt; &lt; factor &gt;</p> <p>Div</p> <p>&lt; factor &gt; &lt; exp &gt; + )</p> <p>{DELTA}</p> <p>&lt; term &gt; factor id{GAMMA}</p> <p>191</p> <p>System Software</p> <p>3.</p> <p>Suppose the rules of the grammar for &lt; exp &gt; and &lt; term &gt; is as follows: &lt; exp &gt; :: = &lt; term &gt; | &lt; exp &gt; * &lt; term&gt; | &lt; exp&gt; Div &lt; term &gt; &lt; term &gt; :: = | &lt; term &gt; + &lt; factor &gt; | &lt; term &gt; - &lt; factor &gt; Draw the parse trees for the following:(a) A1 + B1 (b) A1 - B1 * G1 (c) A1 + DIV (B1 + G1) - D1 &lt; exp &gt; | term &lt; term &gt; factor id {A1} (b) A1 - B1 * G1 + &lt; factor &gt; | id {B1} &lt; exp &gt; | teerm term id factor {A1} id {B1} (c) A1 DIV (B1 + A1) - D1 &lt; exp &gt; &lt; exp &gt; &lt; term &gt; &lt; factor &gt; | id {A1} ( DIV &lt; factor &gt; * factor | id {G1}</p> <p>(a) A1 + B1</p> <p>teerm factor</p> <p>&lt; term &gt; &lt; term &gt; &lt; factor &gt; &lt; factor &gt; id {D1} )</p> <p>&lt; exp &gt; &lt; term &gt;</p> <p>&lt; term &gt;</p> <p>+</p> <p>&lt; factor &gt;</p> <p>Compilers</p> <p>192 &lt; factor &gt; id {B1} id {G1}</p> <p>LEXICAL ANALYSIS Lexical Analysis involves scanning the program to be compiled. Scanners are designed to recognize keywords, operations, identifiers, integer, floating point numbers, character strings and other items that are written as part of the source program. Items are recognized directly as single tokens. These tokens could be defined as a part of the grammar. Example: : : = | | : : = A | B | C | . . . | Z : : = 0 | 1 | 2 | . . . | 9 In a such a case the scanner world recognize as tokens the single characters A, B, . . . Z,, 0, 1, . . . 9. The parser could interpret a sequence of such characters as the language construct &lt; ident &gt;. Scanners can perform this function more efficiently. There can be significant saving in compilation time since large part of the source program consists of multiple-character identifiers. It is also possible to restrict the length of identifiers in a scanner than in a passing notion. The scanner generally recognizes both single and multiple character tokens directly. The scanner output consists of sequence of tokens. This token can be considered to have a fixed length code. The fig. 7 gives a list of integer code for each token for the program in fig. 5 in such a type of coding scheme, the PROGRAM is represented by the integer value 1, VAR has the integer value 2 and so on.Token Code Token Token Token Code Token Code Program 1 READ := := 15 Id 22 VAR 2 WRITE + + 16 Int 23 Fig. 7 Token Coding Scheme BEGIN 3 To 17 END 4 Do K K 18 END 5 ; DIV DIV 17 INTEGER 6 : ( ( 20 FOR 7 , ) ) 21</p> <p>For a keyword or an operator the token loading scheme gives sufficient information. In the case of an identifier, it is also necessary to supply particular identifier name that was scanned. It is true for the integer, floating point values, character-string constant etc. A token specifier can be associated with the type of code for such tokens. This specifier gives the identifier name, integer value, etc., that was found by the scanner.</p> <p>193</p> <p>System Software</p> <p>Some scanners enter the identifiers directly into a symbol table. The token specifier for the identifiers may be a pointer to the symbol table entry for that identifier. The functions of a scanner are: The entire program is not scanned at one time. Scanner is a operator as a procedure that is called by the processor when it needs another token. Scanner is responsible for reading the lines of the source program and possible for printing the source listing. The scanner, except for printing as the output listing, ignores comments. Scanner must look into the language characteristics. Example: FOTRAN : : : PASCAL : : : Columns 1 - 5 Statement number Column 6 Continuation of line Column 7 . 22 Program statement Blanks function as delimiters for tokens Statement can be continued freely End of statement is indicated by ; (semi column)</p> <p>Scanners should look into the rules for the formation of tokens.</p> <p>Example: 'READ': Should not be considered as keyword as it is within quotes. i.e., all string within quotes should not be considered as token. Blanks are significant within the quoted string. Blanks has important factor to play in different language</p> <p>Example 1: FORTRAN Statement: Do 10 I = 1, 100 ; Do is a key word, I is identifier, 10 is the statement number Statement: Do 10 I = 1 ;It is an identifier Do 10 I = 1 Note: Blanks are ignored in FORTRAN statement and hence it is a assignment statement. In this case the scanner must look ahead to see if there is a comma (,) before it can decide in the proper identification of the characters Do. Example 2: In FORTRAN keywords may also be used as an identifier. Words such as IF, THEN, and ELSE might represent either keywords or variable names.IF (THEN .EQ ELSE) THEN IF = THEN ELSE THEN = IF ENDIF</p> <p>Compilers</p> <p>194</p> <p>Modeling Scanners as Finite Automata Finite automatic provides an easy way to visualize the operation of a scanner. Mathematically, a finite automation consists of a finite set of states and a set of transition from one state to another. Finite automatic is graphically represented. It is shown in fig, State is represented by circle. Arrow indicates the transition from one state to another. Each arrow is labeled with a character or set of characters that can be specified for transition to occur. The starting state has an arrow entering it that is not connected to anything else.1 State Final State Fig. 8 Transition</p> <p>Example: Finite automata to recognize tokens is gives in fig. 9. The corresponding algorithm is given in fig. 100-9 A-Z B 1 2 2 Fig. 9 3 A-Z</p> <p>Get first Input-character If Input-character in [ 'A' . . ' Z' ] then begin while Input - character in [ 'A' . . 'Z', ' 0'. . ' 9' ] do begin get next input - character End {while} end {if first is [ 'A' .. ' Z' ] } else return (token-error)Fig. 10</p> <p>SYNTACTIC ANALYSIS During syntactic analysis, the source programs are recognized as language constructs described by the grammar being used. Parse tree uses the above process for translation of statements, Parsing techniques are divided into two general classes: -- Bottom up and -- Top down. Top down methods begin with the rule of the grammar that specifies the goal of the analysis ( i.e., the root of the tree), and attempt to construct the tree so that the terminal nodes match the statement being analyzed. Bottom up methods begin with the terminal nodes of the tree and attempt to combine these into successively high - level nodes until the root is reached.</p> <p>195</p> <p>System Software</p> <p>OPERATOR PRECEDENCE PARSING The bottom up parsing technique considered is called the operator precedence method. This method is loaded on examining pairs of consecutive operators in the source program and making decisions about which operation should be performed first. Example: A + B * C - D (1) The usual procedure of operation multiplication and division has higher precedence over addition and subtraction. Now considering equation (1) the two operators (+ and *), we find that + has lower precedence than *. This is written as + * [+ has lower precedence *] Similarly ( * and - ), we find that * - [* has greater precedence -]. The operation precedence method uses such observations to guide the parsing process. A+B*C -D ENDENDBEGINVAR WRITEREASFOR INTEGER</p> <p>(2)-+: =, ::DOTO ()DIV* IntId </p> <p>PROGRAM VAR BEGIN END INTEGER FOR READ WRITE TO DO ; : , := + * DIV ) (</p> <p> ) ; (c) . . . BEGIN &lt; N2 &gt; ; (d) ... READ &lt; N2 &gt; ( id )</p> <p>197</p> <p>System Software</p> <p>(VALUE)Fig. 12</p> <p>According to the grammar id may be considered as &lt; factor &gt; . (rule 12), (rule 9) or a &lt; id-list &gt; (rule 6). In operator precedence phase, it is not necessary to indicate which non-terminal symbol is being recognized. It is interpreted as non-terminal &lt; N1 &gt;. Hence the new version is shown in fig. 12(b). An operator-precedence parser generally uses a stack to save token that have been scanned but not yet parsed, so it can reexamine them in this way. Precedence relations hold only between terminal symbols, so &lt; N1 &gt; is not involved in this process and a relationship is determined between (and). READ () corresponds to rule 13 of the grammar. This rule is the only one that could be applied in recognizing this portion of the program. The sequence is simply interpreted as a sequence of some interpretation &lt; N2 &gt;. Fig. 12(c) shows this interpretation. The parser tree is given in fig. 12(d). Note: (1) The parse tree in fig. 1 and fig. 12 (d) are same except for the name of the non-terminal symbols involved. (2) The name of the non-terminals is arbitrarily chosen. Example: VARIANCE ; = SUMSQ DIV 100 - MEAN * MEAN (i) . . id 1 : = id 2 Div . . (ii) . . . id 1 : = Div int (iii) . . . id 1 : = Div </p> <p>.</p> <p> {SUMSQ} {SUMSQ}</p> <p>int {100}</p> <p>(iv) . . . . id 1 : = - id 3 * </p> <p> DIV id2 {SUMSQ} int {100} id 3 {MEAN}</p> <p>v) . . . . id 1 : = - * id 4 </p> <p>;</p> <p>(vi) . . . id 1 : = - * id 4 {MEAN}</p> <p>Compilers</p> <p>198 (vii) . . . id 1 : = - id 3 {MEAN} * id 4 {MEAN} </p> <p>(viii) . . . id : = (ix) . . . </p> <p> id 1 {VARIANCE} := DIV id 2 {SUMSQ} * int id 3 id 4 {100} {MEAN} {MEAN} </p> <p>SHIFT REDUCE PARSING The operation procedure parsing was developed to shift reduce parsing. This method makes use of a stack to store t...</p>