compiler design introduction (2)

32
Introduction Translator:-It is a software which translates one level language into a functionally equivalent other level language. Assembler: It is a program which translates source (Assembly level) language into a functionally equivalent target (Machine level) language. Interpreter: It is a program which translates source (High level) language into a functionally equivalent target (Machine level) language. An interpreter reads one instruction or line of the source code at a time, converts this line into machine code and executes it. The machine code is then discarded and the next line is read. The advantage of this is that, it is simple and you can interrupt it while it is running, change the program and either continue or start again. It is easier to find errors. The disadvantage is that every line has to be translated every time it is executed, even if it is executed many times as the program runs. Because of this interpreters tend to be slow. However the source code of an interpreted language cannot run without the interpreter. SibaramaPanigrahi, Lecture in CSEPage 1

Upload: sibarama-panigrahi

Post on 31-Dec-2015

43 views

Category:

Documents


4 download

TRANSCRIPT

IntroductionTranslator:-It is a software which translates one level language into a functionally equivalent other level language.

Assembler:

It is a program which translates source (Assembly level) language into a functionally equivalent target (Machine level) language.

Interpreter:

It is a program which translates source (High level) language into a functionally equivalent target (Machine level) language.

An interpreter reads one instruction or line of the source code at a time, converts this line into machine code and executes it. The machine code is then discarded and the next line is read.

The advantage of this is that, it is simple and you can interrupt it while it is running, change the program and either continue or start again.

It is easier to find errors. The disadvantage is that every line has to be translated every time it is executed, even if it

is executed many times as the program runs. Because of this interpreters tend to be slow. However the source code of an interpreted language cannot run without the interpreter. Examples of interpreters are Basic on older home computers, and script interpreters such

as JavaScript, and languages such as Lisp and Forth.

Compiler:

It is a program which translates source (High level) language into a functionally equivalent target (Machine level) language.

A compiler reads the whole source code and translates it into a complete machine code program which is output as a new file. After translation of whole program is over it starts execution. This completely separates the source code from the executable file, as execution is done on intermediate object file.

SibaramaPanigrahi, Lecture in CSE Page 1

The biggest advantage of this is that the translation is done once only and as a separate process. The program that is run is already translated into machine code so is much faster in execution.

Compilers produce better optimized code that generally run faster and compiled code is self-sufficient and can be run on their intended platforms without the compiler present.

The disadvantage is that you cannot change the program without going back to the original source code, editing that and recompiling (though for a professional software developer this is more of an advantage because it stops source code being copied).

It is difficult to find errors. Current examples of compilers are Visual Basic, C, C++, C#, FORTRAN, COBOL, Ada,

and Pascal.

Types Of Compiler:

Compilers are of two kinds: native compiler and cross compiler.

1. Native compilers are the compilers that run on one machine and produces object code for the same machine.For example, SMM is a compiler for the language S that is in a language that runs on machine M and generates output code that runs on machine M.

2. Cross compilers are compilers that run on one machine and produces object code for another machine.

For example, SNM is a compiler for the language S that is in a language that runs on machine N and generates output code that runs on machine M.A compiler can be characterized by three languages:

i. The source language (S)ii. The target language (T)iii. The implementation language (I)

. The three languages S, I, and T can be quite different. Such a compiler is called cross-compiler

SibaramaPanigrahi, Lecture in CSE Page 2

Boot Strapping Bootstrapping is obtaining a compiler for a language L by writing the compiler code in the

same language L.

DE-Compiler:It is program which translates Machine level language into a functionally equivalent High

level language.

Source to source Translator ( language translator or language converter) A compiler that translates a high level language into a functionally equivalent another high

level language is called source to source translator.

How to translate?

The high level languages and machine languages differ in level of abstraction. At machine level we deal with memory locations, registers whereas these resources are never accessed in high level languages. But the level of abstraction differs from language to language and some languages are farther from machine code than others.

. Goals of translation Good performance for the generated code Good compile time performance

Good performance for generated code: The metric for the quality of the generated code is the ratio between the size of handwritten code and compiled machine code for same program. A better compiler is one which generates smaller code. For optimizing compilers this ratio will be lesser.

Good compile time performance: A handwritten machine code is more efficient than a compiled code in terms of the performance it produces. In other words, the program handwritten in machine code will run faster than compiled code. If a compiler produces a code which is 20-30% slower than the handwritten code then it is considered to be acceptable. In addition to this, the compiler itself must run fast (compilation time must be proportional to program size).

- Maintainable code - High level of abstraction. Correctness is a very important issue.

Correctness: A compiler's most important goal is correctness - all valid programs must compile correctly. How do we check if a compiler is correct i.e. whether a compiler for a programming language generates correct machine code for programs in the language. The complexity of writing a correct compiler is a major limitation on the amount of optimization that can be done.

Can compilers be proven to be correct? Very tedious! . However, the correctness has an implication on the development cost

SibaramaPanigrahi, Lecture in CSE Page 3

Phases of Compiler:A compiler is composed of several components called as phases each performing one specific task.The complete compilation process is divided into six phases and those phases can be regrouped into two parts.

1. Analysis Phase: In this part the source program is broken into constituent pieces and creates an intermediate representation. Analysis can be done in three sub phases.i. Lexical Analysisii. Syntax Analysisiii. Semantic Analysis

2. Synthesis Phase: Synthesis constructs the desired target program from intermediate representation. Synthesis can be done in three sub phases.i. Intermediate code generationii. Code Optimizationiii. Code Generation

Many modern compilers share a common 'two stage' design.

SibaramaPanigrahi, Lecture in CSE Page 4

The "front end" translates the source language or the high level program into an intermediate representation. It includes all analysis phases and intermediate code generation phase. It analyses the source program and produces intermediate code.

The second stage is the "back end", which works with the internal representation to produce code in the output language which is a low level code. The higher the abstraction a compiler can support, the better it is. It includes code optimization and code generation phase of compiler. It synthesizes the target program from the intermediate code.

Lexical Analysis(Scanning)

In simple words, lexical analysis is the process of identifying the words from an input string of characters, which may be handled more easily by a Syntax Analyzer. These words must be separated by some predefined delimiter or there may be some rules imposed by the language for breaking the sentence into tokens or words which are then passed on to the next phase of syntax analysis. In programming languages, a character from a different class may also be considered as a word separator.

Recognizing words is not completely trivial. For example: is this a sentence? Therefore, we must know what the word separators are The language must define rules for breaking a sentence into a sequence of words. Normally white spaces and punctuations are word separators in languages. In programming languages a character from a different class may also be treated as word

separator.Lexeme:Sequence of characters that forms a token of a language is known as lexeme.Token: Categorization of lexeme is called as token, which is the smallest individual part of a Program.

The first phase of a compiler is called lexical analysis or scanning.Thelexical analyzer scans character by character from the source program and groups the characters into meaningful sequences called lexemes and checks whether it is a valid token of that language or not.

If the lexeme is not a valid token then it generates an error i.e. handled by error handler. If the lexeme is a valid token then For each lexeme, the lexical analyzer produces as

output a token of the form (Token-name, attribute-value)

In order to perform above operation the lexical analyzer design must:1. Specify the token of the language2. Suitably recognize the tokens.To specify the token of the language, Regular expression concept from automata theory is used and recognition of the tokens is done by Deterministic Finite Automata.

SibaramaPanigrahi, Lecture in CSE Page 5

Pattern:Rule of description is a pattern.

For example letter (letter + digit)* is a pattern to symbolize a set of strings which consist of a letter followed by a letter or digit

Q. Find the tokens and lexeme of following program.

main(){int a=5;int b[11];while(a<=5)b[a]=3*a;}

SolutionLexeme Token

main Identifier( Special Character) Special Character{ Special Character

Int Keyworda Identifier= Assignment Operator5 Constant; Delimiterb Identifier[ ] Subscript Operator

while Keyword< Relational Operator3 Constant} Special Character

Q. What do you mean by porting of a compiler?

Solution:The process of modifying an existing compiler to work on a new machine is often known as porting the compiler. Porting a compiler to a new host just requires that the back end of the source code be rewritten to generate code for the new machine. This is then compiled using the old compiler to produce a working version of for the new machine.

SibaramaPanigrahi, Lecture in CSE Page 6

Transition diagram for relational operators(relop)

token is relop , lexeme is >=

token is relop, lexeme is >

 

token is relop, lexeme is <

token is relop, lexeme is <>(not)

token is relop, lexeme is <=

token is relop, lexeme is =

token is relop , lexeme is >=

token is relop, lexeme is >

In case of < or >, we need a look ahead to see if it is a <, = , or <> or = or >. We also need a global data structure which stores all the characters. In lex, yylex is used for storing the lexeme. We can recognize the lexeme by using the transition diagram. Depending upon the number of checks a relational operator uses, we land up in a different kind of state like >= and > are different. From the transition diagram in the slide it's clear that we can land up into six kinds of relops.

Transition diagram for identifier In order to reach the final state, it must encounter a letter followed by one or more letters

or digits.#include<stdio.h>#include<conio.h>Void main(){charch;ch=getchar();if(isalpha(ch))ch=getchar();elseerror();while(isalpha(ch)||isdigit(ch))ch=getchar();}

SibaramaPanigrahi, Lecture in CSE Page 7

Transition diagram for white spaces

Transition diagram for white spaces : In order to reach the final state, it must encounter a delimiter (tab, white space) followed by one or more delimiters and then some other symbol.

Transition diagram for unsigned numbers

Transition diagram for Unsigned Numbers: We can have twokinds of unsigned numbers and hence need two transition diagrams which distinguish each of them. The first one recognizesrealnumbers. The second one recognizesintegers.

LEX- A Lexical Analyzer Generator:

Since the function of the lexical analyzer is to scan the source program and produce a stream of tokens as output.

Therefore, the first thing that is required is to identify what the keywords are, what the operators are, and what the delimiters are. These are the tokens of the language.

After identifying the tokens of the language, we must use suitable notation to specify these tokens. This notation, should be compact, precise, and easy to understand. Regular expressions can be used to specify a set of strings, and a set of strings that can be specified by using regular-expression notation is called a "regular set." The tokens of a programming language constitute a regular set.

Hence, this regular set can be specified by using regular-expression notation. Therefore, we write regular expressions for things like operators, keywords, and identifiers. For example,

SibaramaPanigrahi, Lecture in CSE Page 8

the regular expressions specifying the subset of tokens of typical programming language are as follows:operators = +| -| * |/ | % keywords = If|while|do|thenletter = a|b|c|d|....|z|A|B|C|....|Zdigit = 0|1|2|3|4|5|6|7|8|9identifier = letter (letter|digit)*

The next step is the construction of a DFA from the regular expression that specifies the tokens of the language. But the DFA is a flow-chart (graphical) representation of the lexical analyzer.

Therefore, after constructing the DFA, the next step is to write a program in suitable programming language that will simulate the DFA. This program acts as a token recognizer or lexical analyzer. Therefore, we find that by using regular expressions for specifying the tokens, designing a lexical analyzer becomes a simple mechanical process that involves transforming regular expressions into finite automata and generating the program for simulating the finite automata.

Therefore, it is possible to automate the procedure of obtaining the lexical analyzer from the regular expressions and specifying the tokens—and this is what precisely the tool LEX is used to do.

Q. What is a LEX?

Hence A Lex is a program generator Designed for lexical processing of character input streams. It accepts high level, problem oriented specification for character string matching and produces a program in general purpose language which recognizes regular expression. The regular expression are specified by the user in the source specification given to LEX .The LEX written code recognizes these expressions in an input stream and partition the input stream into strings matching the regular expressions.

Errors Detected By Lexical Analyzer: The lexical analyzer detects the following type of errors

Characters that cannot appear in any token in our source language, such as @ or #. Integer constants out of bounds (range is 0 to 32767 for signed int in C Language). Identifier names that are too long (maximum length is 32 characters in C Language). Text strings that are two long (maximum length is 256 characters in C Language). Text strings that span more than one line. (In C Language)

Read The Following TOPICS

1. DFA,NFA, Removal Of Epsilon Production, Removal of Unit Production, Removal of Useless Grammar, Removal Of Left Recursion

SibaramaPanigrahi, Lecture in CSE Page 9

SYNTAX ANALYSISA syntax analyzer or parser is a program that performs syntax analysis.

A parser obtains a string of tokens from the lexical analyzer and verifies whether or not the grouping of tokens is a valid construct of the source language-that is, whether or not it is in accordance to the grammar for the source language.

If the tokens in a string are grouped according to the language's rules of syntax, then the string of tokens generated by the lexical analyzer is accepted as a valid construct of the language and it produce a parse tree.Otherwise, an error handler is called.

Hence, two issues are involved when designing the syntax-analysis phase of a compilation process:

1. All valid constructs of a programming language must be specified and by using these specifications, a valid program is formed.

2. A suitable recognizer will be designed to recognize whether a string of tokens generated by the lexical analyzer is a valid construct or not.

Therefore, suitable notation must be used to specify the constructs of a language. The notation for the construct specifications should be compact, precise, and easy to understand.

The syntax-structure specification for the programming language (i.e., the valid constructs of the language) uses context-free grammar (CFG), because for certain classes of grammar, we can automatically construct an efficient parser that determines if a source program is syntactically correct.

Classification of Parsing:Process of determination whether a string can be generated by a grammar.

Parsing is of two types.1. Top Down Parsing2. Bottom up Parsing

Top Down Parsing Bottom up Parsing

1. An attempt to derive w from the grammar's start symbol S by using the grammar of the language is known as Top down Parsing.

1. An attempt to reduce w to the grammar's start symbol S by using the grammar of the language is known as Bottom up Parsing

2. Top-down parsing attempts to find the left-most derivations for an input string w.

2. Bottom up Parsingattempts to find the left-most derivations for an input string w.

3. Back tracking is a problem found in Top down Parsing.

3. No Back Tracking is required.

4. E.g. LL(0), LL(1) Parsers 5. LR(0), LR(1), LALR Parsers

SibaramaPanigrahi, Lecture in CSE Page 10

TOP DOWN PARSING Top-down parsing attempts to find the left-most derivations for an input string w, which

is equivalent to constructing a parse tree for the input string w that starts from the root and creates the nodes of the parse tree in a predefined order.

Q. Why Top down Parsing seeks the left most derivation of input String?

The reason that top-down parsing seeks the left-most derivations for an input string w and not the right-most derivations is that the input string w is scanned by the parser from left to right, one symbol/token at a time, and the left-most derivations generate the leaves of the parse tree in left-to-right order, which matches the input scan order.

Back Tracking:

In the attempt to obtain the left-most derivation of the input string w, a parser may encounter a situation in which a nonterminal A is required to be derived next, and there are multiple A-productions, such as A→α1 | α2 | … | αn.

In such a situation, deciding which A-production to use for the derivation of A is a problem.

Therefore, the parser will select one of the A-productions to derive A1. If this derivation finally leads to the derivation of w, then the parser

announces the successful completion of parsing. 2. Otherwise, the parser resets the input pointer to where it was when the

nonterminal A was derived, and it tries another A-production. The parser will continue this until it either announces the successful completion of the parsing or reports failure after trying all of the alternatives.

Example:1

Consider the top-down parser for the following grammar:

Let the input string be w = acb. The parser initially creates a tree consisting of a single node, labeled S, and the input pointer points to a, the first symbol of input string w. The parser then uses the S-production S → aAb to expand the tree as below

SibaramaPanigrahi, Lecture in CSE Page 11

Parser uses the S-production to expand the parse tree. The left-most leaf, labeled a, matches the first input symbol of w. Hence, the parser will now advance the input pointer to c, the second symbol of string w, and consider the next leaf labeled A. It will then expand A, using the first alternative for Ai.e. Acdin order to obtain the tree shown below

Parser uses the first alternative for A in order to expand the tree. The parser now has the match for the second input symbolc.

So, it advances the pointer to b, the third symbol of w, and compares it to the label of the next leaf. If the label does not match d, it reports failure and goes back (backtracks) to A, as shown above figure.

The parser will also reset the input pointer to the second input symbol—the position it had when the parser encountered A—and it will try a second alternative to Ai.e.Ac in order to obtain the tree as shown in below figure.

As here all the symbols match with string w=acb so this derivation is accepted.

SibaramaPanigrahi, Lecture in CSE Page 12

Example:2Consider a grammar S → aa | aSa. If a top-down backtracking parser for this grammar tries S → aSa before S → aa, show that the parser succeeds on two occurrences of a.

In the case of two occurrences of a, the parser will first expand S

The first input symbol matches the left-most leaf. Therefore, the parser will advance the pointer to a second a and consider the nonterminal S for expansion in order to obtain the tree shown below.

The parser advances the pointer to a second occurrence of a.The second input symbol also matches. Therefore, the parser will consider the next leaf labeled S and expand it

The parser expands the next leaf labeled S

SibaramaPanigrahi, Lecture in CSE Page 13

The parser now finds that there is no match. Therefore, it will backtrack to S, as shown by the thick arrow in below figure. The parser then continues matching and backtracking, as shown in below figures until it arrives at the required parse tree.

SibaramaPanigrahi, Lecture in CSE Page 14

SibaramaPanigrahi, Lecture in CSE Page 15

Question:Consider a grammar S → aa | aSa. If a top-down backtracking parser for this grammar tries S → aSa before S → aa, show that the parser succeeds on four occurrences of a (aaaa), but not on six occurrences of a (aaaaaa). Also check whether whether the parser succeeds or not on eighth occurrence of a (aaaaaaaa).

SibaramaPanigrahi, Lecture in CSE Page 16

FIRST and FOLLOW:Before going to study Top Down predictive parsing a student must have the knowledge of FIRST and Follow.

FIRST and FOLLOW are two Functions associated with a grammar that help us fill in the entries of Predictive Parsing Table.

FIRST and FOLLOW of Nonterminal is found.

FIRST () :This function gives the set of terminals that begin the strings derived from production rules.

FOLLOW() :This function gives the set of terminals that can appear immediately to the right of given symbol.

Algorithm/Rules to compute FIRST():Consider a grammar G={V,T,P,S}For every Production P

1. If a production of the form VT(V+T)* //Stared with a Terminal

FIRST(V)=FIRST(T(V+T)*)

={T}

2. If a production of the form V∈ //Epsilon production

FIRST(V)=FIRST(∈)

={∈}

3. If a production is of the form VV1X //(V1=Non Terminal& X=(V+T)*)

To find FIRST(V) check whether FIRST(V1) contains ∈ or not

(i) FIRST(V1) contains ∈Then FIRST(V)= {FIRST(V1) --∈} U {FIRST(X)}

(ii) FIRST(V1) does not contain ∈Then FIRST(V)= {FIRST(V1)}

Example-1Find the FIRST of Following GrammarSACB | CbB | BaAda | BC

SibaramaPanigrahi, Lecture in CSE Page 17

Note:While Calculating FIRST Start from Bottom and Look at the Left hand side of production to find the Nonterminal, whose FIRST you want to calculate.

HINT: To Find FIRST(C) Look at the production which

contains C on its LHS. So Here we use Ch | ∈

Started with a Nonterminal

B g | ∈C h | ∈Solution:FIRST(C) = FIRST(h) ∪ FIRST(∈)

={h }∪{ ∈ }={h, ∈ }

FIRST(B) = FIRST(g) ∪ FIRST(∈ ) = { g}∪{ ∈ } = { g, ∈ }

FIRST(A) = FIRST(da) ∪ FIRST(BC)FIRST(BC) = FIRST(B) − { ∈ } ∪ FIRST(C)[Since First(B) contains ∈]={{g,∈}-{∈}} ∪{h,∈}

={g,h,∈}

Therefore FIRST(A) = FIRST(da) ∪ FIRST(BC)={d}∪{g,h,∈}={d,g,h,∈}

FIRST(S) = FIRST(ACB) ∪ FIRST(CbB) ∪FIRST(Ba)

Therefore:FIRST(ACB) =FIRST(A) − { ∈ } ∪ FIRST(CB)[Since First(A) contains ∈]

= {d,g,h,}∪FIRST(C) − { ∈ } ∪ FIRST(B) = {d, g, h,}∪{ h }∪{ g, ∈ }={d,g,h,∈}

FIRST(CbB) =FIRST(C) − { ∈ } ∪ FIRST(bB)[Since First(C) contains ∈]={h}∪{b}={h,b}FIRST(Ba) =FIRST(B) − { ∈ } ∪ FIRST(a)[Since First(B) contains ∈]={g}∪{a}={g,a}FIRST(S) = FIRST(ACB) ∪ FIRST(CbB) ∪FIRST(Ba) ={d,g,h,∈}∪{h,b }∪{g,a }={a,b,d,g,h,∈}Example:2Find the FIRST of following grammarETA

SibaramaPanigrahi, Lecture in CSE Page 18

A+TA /∈TFBB*FB / ∈F(E)/ idSolution:FIRST(F)=FIRST((E)) U FIRST(id)={(}U {id} ={(,id}FIRST(B)=FIRST(*FB)U FIRST(∈) ={*}U {∈} ={*,∈}FIRST(T)=FIRST(FB) ={FIRST(F)} [Since First(F) does not contain ∈] ={(, id}FIRST(A)=FIRST(+TA)U FIRST(∈) ={+,∈}FIRST(E)=FIRST(TA) =FIRST(T) [Since First(T) does not contain ∈] ={(,id}

Algorithm/Rules to compute FOLLOW():Consider a grammar G={V,T,P,S}For every Production P

1. A {$} is included in the follow of start symbol.

2. If a production of the form V(V+T)*XY //Y represents (V+T)*

To find FOLLOW(X) check whether FIRST(Y) contains ∈ or not

(i) FIRST(Y) contains ∈Then FOLLOW(X)= {FIRST(Y) --∈} U {FOLLOW(V)}

(iii) FIRST(Y) does not contain ∈Then FOLLOW(X)= FIRST(Y)

3. If a production is of the form V(V+T)*X//X is a Nonterminal

FOLLOW(X)=FOLLOW(V)

Example:1Find the FOLLOW of following grammarETA

SibaramaPanigrahi, Lecture in CSE Page 19

Note:While Calculating FOLLOW Start from TOP and Look at the Right hand side of production to find the Nonterminal, whose FOLLOW you want to calculate.

Note (Refer E.g. 2)FIRST(E) ={( , id}FIRST(A) ={+ , ∈}FIRST(T) ={( , id}FIRST(B) ={* , ∈}FIRST(F) ={( , id}

A+TA /∈TFBB*FB / ∈F(E)/ idSolution:FOLLOW(E)=FIRST( )) U {$} ={ ), $}FOLLOW(A)=FOLLOW(E)U FOLLOW(A) [Applying Rule-3 on ETA and A+TA] ={),$}

FOLLOW(T)={FIRST(A) – {∈}} U FOLLOW(E) U {FIRST(A) – {∈}} U FOLLOW(A)[Applying Rule 2 case-1 on ETA & A+TA ]

={{+, ∈} – {∈}} U { ), $} U {{+, ∈} – {∈}} U {),$} ={),$,+}FOLLOW(B)=FOLLOW(T) U FOLLOW(B) ={),$,+}FOLLOW(F)={{FIRST(B) – {∈}} U FOLLOW(T))}U {{FIRST(B) – {∈}} U FOLLOW(B)}={*}U{),$,+}U {*} U {),$,+}={+, ), *, $}

Top Down Predictive Parsers:A Predictive parser is an efficient way of implementing recursive descent parsing since a stack is maintained in predictive parsing for handling the activation of records.

In Top down predictive parsing the parser will be able to predict the right alternative for expansion of non-terminal during parsing process and hence it backtracking is not required.

Parse table is a two dimensional array M[X,a] where "X" is a non-terminal and "a" is a terminal of the grammar.

In two dimensional parse table each row corresponds to a Nonterminal and each column corresponds to a terminal.

SibaramaPanigrahi, Lecture in CSE Page 20

It is possible to build a non-recursive predictive parser maintaining a stack explicitly, rather than implicitly via recursive calls. A table-driven predictive parser has an input buffer, a stack, a parsing table, and an output stream.

The input buffer contains the string to be parsed, followed by $, a symbol used as a right end marker to indicate the end of the input string.

The stack contains a sequence of grammar symbols with a $ on the bottom, indicating the bottom of the stack.

Initially the stack contains the start symbol of the grammar on top of $. The parsing table is a two-dimensional array M [X,a] , where ‘X’ is a non-

terminal, and ‘a’ is a terminal or the symbol $ . The key problem during predictive parsing is that of determining the production to be applied for a non-terminal. The non-recursive parser looks up the production to be applied in the parsing table.

Algorithm For Creating A Predictive Parsing Table:

Step-1Compute FIRST and FOLLOW for every nonterminal of the grammar.

Step-2For every production A→α Check whether it is an epsilon Production or not

1. If A→α is not an epsilon production

Then Look at FIRST (A) for every b in FIRST (A) which is derived from AαdoTABLE [A, b] = A→α

2. If A→α is an epsilon production i.e A∈Then Look at FOLLOW (A)for every b in FOLLOW (A) do

TABLE[A, b] = A→∈Example:For the following grammar draw the predictive parsing table and check whether id+id*id is acceptable by that grammar or not.ETAA+TA /∈TFBB*FB / ∈F(E)/ idSolution:Step:1Find the FIRST & FOLLOW Of GrammarFrom e.g. 2 of FIRST and Example of Follow we can refer the first and follow.

SibaramaPanigrahi, Lecture in CSE Page 21

Note (Refer E.g. 2)FIRST(E) ={( , id}FIRST(A) ={+ , ∈}FIRST(T) ={( , id}FIRST(B) ={* , ∈}FIRST(F) ={( , id}

Note (Refer E.g. 2 OF FOLLOW)FOLLOW (E)={ ), $}FOLLOW (A) ={ ), $}FOLLOW (T) ={+, ), $}FOLLOW (B) ={+, ), $}FOLLOW (F) ={+, ), *, $}

Step:2 Draw the Predictive parse Table

( ) id + * $E ETA ETAA A∈ A+TA A∈T TFB TFBB B∈ B∈ B*FB B∈F F(E) F id

Checking For the acceptance of id+id*id

Stack Input String Production UsedE$ Id+id*id$

TA$ Id+id*id$ ETAFBA$ Id+id*id$ TFBidBA$ Id+id*id$ F idBA$ +id*id$ Id is cancelled from stack and i/p stringA$ +id*id$ B∈

+TA$ +id*id$ A+TATA$ id*id$ + is cancelled from stack and i/p string

FBA$ id*id$ TFBidBA$ id*id$ FidBA$ *id$ id is cancelled from stack and i/p string*FBA$ *id$ B*FBFBA$ id$ * is cancelled from stack and i/p stringidBA$ id$ FidBA$ $ id is cancelled from stack and i/p stringA$ $ B∈$ $ A∈$ $ Accepted

As the string id+id*id is derived from start symbol E so the string is acceptable by the above grammar.

The heart of the table-driven predictive parser is the parsing table-the parser looks at the parsing table to decide which alternative is a right choice for the expansion of a nonterminal

SibaramaPanigrahi, Lecture in CSE Page 22

during the parsing of the input string. Hence, constructing a table-driven predictive parser can be considered as equivalent to constructing the parsing table.

While drawing parsing table the parser have to check whether or not there exist multiple entries for same cell in predictive parsing table.

Case-1: If there is no multiple entries for all cell in predictive parsing table then the parser becomes deterministic and hence backtracking is not required while deriving the string using predictive parsing table.

Case-2: If there is multiple entries for same cell in predictive parsing table then the parser is still non-deterministic in predicting which production to use and hence backtracking is required while deriving the string using predictive parsing table.

LL(1) PARSER

A given grammar is LL(1) if its parsing table contains no multiple entries. If the table contains multiple entries, then the grammar is not LL(1).

In the acronym LL(1), the first L stands for the left-to-right scan of the input, the second L stands for the left-most derivation, and the (1) indicates that the next input symbol is used to decide the next parsing process (i.e., length of the look ahead is "1").

Algorithm For Checking LL(1) Grammar:

For every pair of productions of the Form A→α | β CheckCondition-1:

FIRST(α) ∩ FIRST(β) = φ

Condition-2:

If FIRST(β) contains ∈, and FIRST(α) does not contain ∈ThenFIRST(α) ∩ FOLLOW(A) = φ

If above two conditions are satisfied by a grammar then i.e. a LL(1) grammar otherwise not.

Example:1Check whether the following Grammar is LL(1) or not ?

SibaramaPanigrahi, Lecture in CSE Page 23

ETAA+TA /∈TFBB*FB / ∈F(E)/ id

Solution:In the above grammar There are three productions of the form A→α | βA+TA /∈B*FB / ∈F(E)/ idFor the production A +TA / ∈ Here α =+TA and β= ∈ Condition-1:FIRST(+TA) ∩ FIRST(∈) ={+} ∩ {∈} =φ [Condition-1 Satisfied]

Condition-2:

If FIRST(β) contains ∈, and FIRST(α) does not contain ∈So Find FOLLOW(A) ∩FIRST(+TA)={ ), $}∩{+} [Look E.g.2 of Follow]=φ [Condition-2 Satisfied]This Production is under LL(1)For the production B *FB / ∈ Here α = *FB and β= ∈

Condition-1:FIRST(*FB) ∩ FIRST(∈) ={*} ∩ {∈} =φ [Condition-1 Satisfied]

Condition-2:

If FIRST(β) contains ∈, and FIRST(α) does not contain ∈So Find FOLLOW(B) ∩FIRST(*FB)={+, ), $} ∩ {*} [Look E.g.2 of Follow]=φ [Condition-2 Satisfied]

This Production is Under LL(1)For the production F (E)/ id Here α = (E) and β= id Condition-1:FIRST((E))∩ FIRST(id)={ ( } ∩ {id} =φ [Condition-1 Satisfied]Since FIRST() does not contain ,So condition-2 need not be checked,Hence this production is also under LL(1).As all the production of the form A→α | β satisfies the conditions. So The Grammar is under LL(1).

SibaramaPanigrahi, Lecture in CSE Page 24