role of parse1

8/3/2019 Role of Parse1

1/20

ROLE OF THE PARSER

The parser obtains a string of tokens from the lexical analyzer and verifies that the string can begenerated by the grammar for the source language.

The parser to report any syntax errors.

To recover from occurring errors so that it can continue processing the remainder of its input.Three types of parsers for grammarsUniversal Parser

Cocke-Younger-Kasami(CYK) algorithm and Earleys algorithm can parse any grammar. Thesemethods are too inefficient to use in production compilers.

Top-down Parser

To build parse trees from the top to the bottom.

Fig. Position of parser in compiler model

Bottom-up Parser

To build parse trees from bottom to the top.

In top-down and bottom-up cases, the input to the parser is scanned from left to right, one symbol ata time and to work only on sub classes of grammars, such as LL and LRgrammars. Automated toolsconstruct parsers for the larger class of LR grammars.

The output of the parser is representation of the parse tree for the stream of tokens produced by thelexical analyzer.

Number of tasks conducted parsing

Collecting information about various tokens into the symbol table.

Performing type checking and other kinds of semantic analysis.

Generating intermediate code.

The nature of syntactic errors and general strategies for error recovery, these two strategies are calledpanic-mode and phrase-level recovery.

Syntax error handling

If a compiler had to process only correct programs, its design and implementation would besimplified.

When programmers write incorrect programs, a good compiler should assist the programmers inidentifying and locating errors.

Compiler requires for syntactic accuracy as computer languages.

Planning the errors handling right from the start can both simplify the structure of a compiler andimprove its response to errors.

Different levels of errors

Lexical

misspelling an identifier, keyword, or operator.

Syntactic an arithmetic expression with unbalanced parentheses.

Semantic an operator applied to an incompatible operand

Token

Get next tokenSource program Lexical

analyzer ParserParse tree

Rest offront end

Intermediate

representation

Symboltable


2/20

Accurately detecting the semantic and logical errors at compile time is very difficult task.

The error handler in a parser has simple-to-state goals

It should report the presence of errors clearly and accurately.

It should recover from each error quickly enough to be able to detect subsequent errors.

It should not significantly slow down the processing of correct programs.

The LL and LR methods of parsers are detect an error as show as possible. They have the viable-

prefix property meaning they detect that an error has occurred as soon as they see a prefix of theinput that is not a prefix of any string in the language.

Error-recovery strategies

To recover from a syntactic error of the different general strategies of a parser.

Panic-mode recovery

The parser discards input symbols one at a time until one of a designated set of synchronizingtokens found. The compiler designer must select the synchronizing tokens appropriate for the sourcelanguage. While panic-mode correction skips an amount of input without checking it for additional

errors. The synchronized tokens are delimiters, such as semicolon or end, whose role in the sourceprogram is clear. It can be used by most parsing methods. If the multiple errors occur in the samestatement, this method may quite.

Advantage : Simplest to implement and guaranteed not to go into an infinite loop.

Phrase-level recovery

A parser may perform local correction on the remaining input, that is, it may replace a prefix ofthe remaining input by some string that allows the parser to continue. A typical local correction would

be to replace a comma by a semicolon, delete an extraneous semicolon, or insert a missing semicolon.The choice of the local correction is left to the complete designer. To choose replacement that do not

lead to infinite loop. This type of replacement can correct any input string and has been used in severalerror-repairing compilers. The method was firs used with top-down parsing.Drawback : It has in coping with situations in which the actual error has occurred before the

point of detection, is difficulty.

Error-productions

The common errors may be encountered, the grammar for the language at hand with productionsthat generate the erroneous constructs. The use of the grammar augmented by these error productions toconstruct a parser. If the parser uses an error production, can generate appropriate error diagnostics toindicate the erroneous construct that has been recognized in the input.

Global correction

A compiler to make as few changes as possible in processing an incorrect input string. There arealgorithms for choosing a minimal sequence of changes to obtain a globally least cost correction. Thesemethods are too costly to implement in terms of time and space, so these techniques are currently onlyof theoretical. Given an incorrect input string xand grammarG, these algorithms will find a parse treefor a related stringy, such that the number of instructions, deletions, and changes of tokens required totransformxintoy is as small as possible. The notion of least-cost correction does provide for evaluatingerror-recovery techniques and it has been used for finding optimal replacement strings for phrase-levelrecovery.

CONTEXT-FREE GRAMMARS Construction of language has a recursive structure can be defined by context free grammars. For


3/20

Terminals are basic symbols from which strings are formed. The word token is a synonym for

terminal in programs, for programming languages. For example, stmt if expr then stmt elsestmt. Each of the keywords if, then and else is a terminal.

Non-terminals are syntactic variables that denote set of strings. The non-terminals define sets ofstrings that help define the language generated by the grammar. A hierarchical structure on thelanguage that is useful for both syntax analysis and translation. In the above example,stmtand exprare non-terminals.

In a grammar, one non-terminal isdistinguished as the start symbol, and the setof strings it denotes is the language defined

by the grammar.

The productions of a grammar, in which theterminals and non-terminals can becombined to form strings. Each productionsconsists of a non-terminal, followed by anarrow, followed by a string of non-terminalsand terminals.

Example :expr expr op exprexpr ( expr)expr - exprexpr idop +op -op *op /op

In this grammar, the terminal symbols are id + - * / ( ) the non-terminal symbols are exprand opand expris the starting symbol.

Notational Conventions

These symbols are terminals

Lower case letters in the alphabet such as a, b, c.

Operator symbols such as +, - etc.

Punctuation symbols such as parentheses, comma, etc.

The digits 0,1,,9.

Bold face strings such as id and if.

These symbols are non-terminals

Uppercase letters in the alphabet such as A,B,C.

The letterS, which, when it appears is usually the start symbol.

Lower-case italic names such as exprorstmt.

Upper-case letters in the alphabet, such as X,Y,Z represent grammar symbols, that is either non-terminals or terminals.

Lower-case letters in the alphabet, u, v,,z represent strings of terminals.

Lower-case letters , , represent strings of grammar symbols. A production can be written asA indicating a single non-terminal A on the left side of the production and a string ofgrammar symbols to the right side of the production.

IfA 1, A 2, ., A k are all productions with A on the left, A 1| 2 | | k,where, 1, 2 , , k the alternatives forA.

Unless otherwise stated, the left side of the first production is the start symbol.

For example, E E A E | ( E ) | - E | id , A + | - | * | / | , here, E and A are non-terminals, withE is the start symbol. The remaining symbols are terminals.

Derivations

A production rule is in which the non-terminal on the left is replaced by the on the right side of theproduction. For example, E E + E | E * E | ( E ) | - E | id


4/20

E

E + E

E

E * E

1. * for any strings ,2. If * and , then * .

A language can be generated by a grammar is said to be a context-free language. If two grammarsgenerate the same language, the grammars are said to be equivalent.

Strings in L(G) may contain only terminal symbols ofG. A string of terminals wis in L(G) if and

only ifS +w. The string wis called a sentence ofG. If string S* , where may contain non-terminals, then is a sentential form ofG. A sentence

is a sequential form with no non-terminals. For example, the string(id + id) is a sentence of grammar E E + E | E * E | ( E ) | -E | id

because there is the derivation E -E - ( E ) -( E + E ) -( id + E ) -( id + id). The strings E, -E, -( E ), ,-( id + id ) appearing in this derivations are all sentential forms of this

grammar. E * -( id + id ) to indicate that(id + id) can be derived from E. Leftmost derivations in which only the leftmost non-terminal in any sentential form is replaced at

each step. It is called leftmost. For example, if by a step in which the leftmost non-terminalin is replaced, then written as lm .

The leftmost derivation is E lm - E lm ( E ) lm - ( E + E ) lm - ( id +E ) lm - ( id + id ).

Using notational conventions, every leftmost step can be written wA lm w where wconsists of terminals only, A is the production applied, and is a string of grammar symbols by a leftmost derivation, *lm . If S *lm , then is a left sentential form of thegrammar.

Right most derivations in which the right most non-terminal is replaced at each step. Rightmostderivations are also called as canonical derivations.

Parse tree and derivations

Parse tree is a graphical representation for a derivation.

Each interior node of a parse tree is labeled

by non-terminal A, and the children of thenode are labeled, from left to right, by thesymbols in the right side of the production

by which A was replaced in the derivation.

The leaves of the parse tree are labeled bynon-terminals or terminals and, read fromleft to right, they construct a sentential form,called the yield or frontier of the tree.

1 2 n , where 1 is asingle non-terminal A. For each sentential

form i in the derivation, a parse treewhose yield is i. The process is aninduction on i.

A parse tree whose yield is i-1 = X1 X2 Xk. i is derived from i-1by replacing Xj,a non-terminal, by = Y1Y2Yr. That is,at the ith step of the derivation, production Xj

is applied to i-1to derive i = X1 X2 Xj-1 Xj+1 Xk. For example, the parse tree for(id + id )implied by derivation.

Example : The sentence id + id * id has the two distinct leftmost derivations.

E E + E E E * E id + E E + E * E

id E * E id E * E

E

- E

( E )

E + E

id id

Fig. Parse Tree for(id + id )


5/20

Note : * operator as having higher precedence than +.

Ambiguity

A grammar produces more than one parse tree for some sentence is said to be ambiguous.

An ambiguous grammar is one that produces more than one left most or right most derivation for thesame sentence.

Writing a grammar

A limited amount of syntax analysis is done by a lexical analyzer as it produces the sequence oftokens from the input characters.

The sequences of tokens accepted by a parser from a superset of a programming language.

Grammars for expressions can be constructed using associativity and precedence information.

It is useful for rewriting grammars for top-down parsing.

Programming language constructs that cannot be described by any grammar.

Regular expressions Vs Context-Free Grammars

Every construct that can be described by aregular expression can also be described bya grammar.

For example, the regular expression (a|b)*abb and the grammar

A0 aA0 | bA0 | aA1A1 bA2A2 bA3A3

describe the same language, the set of strings ofas and bs ending in abb.

To convert a NFA into a grammar thatgenerates the same language as recognized

by the NFA.

For each state i of the NFA, create a non-terminal symbol Ai .

If state i has a transition to statej on symbol

a, the production Ai aAj. If state i goes to state j on input , the

production Ai Aj. If accepting state, Ai. If i is the start state, make Ai be the start

symbol of the grammar.

Every regular set is a context-free language.

Use of regular expressions to define the lexical syntax of a language :

The lexical rules of a language are quite simple, and to describe no need a notation as powerful asgrammars.

More concise and easier to understand notation for tokens than grammars.

More efficient lexical analyzers can be constructed automatically from regular expressions than fromarbitrary grammars.

Separating the syntactic structure of a language into lexical and non-lexical parts provides aconvenient way of modularizing the front end of a compiler into two manageable-sized components.

Regular expressions are useful for describing the structure of lexical constructs, such as, identifiers,constants, keywords.

Grammars are useful for describing nested structures, such as, balanced parentheses, matchingbegin-ends, corresponding if-then-elses.

Note: Nested structures cannot be described by regular expressions.

Verifying the language generated by a grammar

To reason that a given set of productions generates a particular language.

Every string generated by a grammar G is in language L, and the language L generated by a


6/20

Any string derivable from S is balanced, and uses of the length of a string. The empty string isderivable from S.

All derivations n steps produce balanced sentences and a leftmost derivation ofn steps.

Every balanced string of length less than 2n is derivable from S, and a balanced string wof length

2n, n 1. Balanced string wbegins with a left parenthesis.

Let (x) be the shortest prefix ofwhaving an equal number of left and right parentheses, so, it can be

written as (x)y where both xandy are balanced, where xandy are of length less than 2n, they arederivable from S by the inductive hypothesis.

To find a derivation of the form S ( S ) S * ( x) S * ( x) y proving that w= ( x)y is alsoderivable from S.

Eliminating ambiguity

An ambiguous grammar can be rewritten to eliminate the ambiguity. For example, dangling-

else grammar: stmt if expr then stmt | if expr then stmt else stmt | other, here, other means forany other statement. According to this grammar, the compound conditional statement can be rewritten asif E1 then S1 else if E2 then S2 else S3 has the parse tree.


7/20

stmt

if expr then stmt else stmt

E1

S1


E1

S1

S2

Fig.(1) Parse tree for conditional statement

stmt

if expr then stmt

E1


E1

S1

S2

stmt


E1

S2

if expr then stmt

E2

S1

Fig. (2) Two parse trees for an ambiguous statements

Rule : Match each else with the closest previous unmatched then.

Fig.1 is preferred, because, disambiguating rule can be incorporated directly into the grammar.Grammar is ambiguous, the string if E1 then S1 else if E2 then S2has the parse tree

A statement appearing between a then and an else must be matched, i.e., it must not end with anunmatched then followed by any statement, for the else would then be forced to match this

unmatched then.

A matched statement is either an if-then-else statement containing no unmatched statements or it isany other kind of unconditional statement.

To use the grammar

stmt matched-stmt | unmatched-stmtmatched-stmt if expr then matched-stmt else matched-stmt | otherunmatched-stmt if expr then stmt | if expr then matched-stmt else unmatched-stmt

this grammar generates the same set of strings as,

if E1 then S1 else if E2 then S2 else S3, but it allows only one parsing for string

if E1 then S1 else if E2 then S2, the one that associates each else with the closest previous unmatchedthen.


8/20

The production of the form A A | is a left recursive pair, it can be replaced by non-left-recursive productions. A A , A A | without changing the set of strings derivablefrom A.

Example :

The grammar for arithmetic expression

E E + T | TT T * F | FF ( E ) | id

To eliminate the immediate left recursion tothe productions forE and T,

E TE E +TE | T FTT *FT | F ( E ) | id.

To eliminate immediate left recursion from them by the following technique.

The productions A A 1 | A 2 | |A m | 1 | 2 | | n where no i begins with an A. Then, replace the A-productions by A 1A | 2A | | nA , A 1A| 2A| | m

A | Example :

S Aa | bA Ac | Sd | .

This production can be rewritten as

S Aa | bA Ac | Aad | bd |

To eliminate the immediate left recursion

S Aa | bA bdA | AA cA | adA |

Algorithm : Eliminating left recursion from a grammar.

Input : GrammarG with no cycles on -productionsOutput : An equivalent grammar with no left recursion.Method : 1. Arrange the non-terminals in some order A1 , A2 , , An.

2. for i := 1 to n do beginfor j := 1 to i-1 do begin

replace each production of the form Ai Aj by the productionsAj 1 | 2 | | kare all the current Aj-productions

end

eliminate the immediate left recursion from a grammar.end.

To eliminate left recursion from a grammar systematically. If the grammar has no cycles or -productions. Cycles can be systematically eliminated from a grammar as can -productions.

LEFT FACTORING

It is a grammar transformation that is useful for producing a grammar suitable for predictive parsing.

If A 1 | 2 are two A- productions, and the input begins with a non-empty stringderived from .

The left factored productions A A , A 1 | 2.Algorithm : Left factoring a grammarInput : GrammarG.Output : An equivalent left-factored grammar.

Method : For each non-terminal A find the longest prefix common to two or more of itsalternatives. If , i.e., there is a non-trivial common prefix, replace all theA-productions A 1 | 2 | | n | , where represents all


9/20

S i E t S | i E t S e S | aE b

here, i, t and e stand forif, then and else, E andS for expression and statement.The left factored for this grammar is :

S i E t S S | aS eS | E b

In this grammar, S i E t S is the left factoredofS i E t S S (1) on input

S i E t S SS eS | (2)

These (1) and (2) grammars are ambiguous, and

on input , it will not be clear which alternativeforS should be chosen.

S i E t S | i E t S e S | aE b

The left factored for this grammar is :

S i E t S S | aS eS | E b


10/20

S S S

c A d c A d c A d

a b a

Fig. Steps in top-down parse

Parsing table M for grammar

NON-

TERMINA

L

INPUT SYMBOL

a b E i t $

SS

a

SiEtSSS S eSS S E E b

Top-down Parsing

To construct an efficient non-backtracking form of top-down parser called a predictive parser.

The class ofLL(1) grammars from which predictive parsers can be constructed automatically.

Recursive-descent parsing

Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string.

It can be viewed as an attempt to construct a parse tree for the input starting from the root and

creating the nodes of the parse tree in preorder.

Recursive-descent parsing is called predictive parsing , where no backtracking is required.

Top-down parsing is called recursive-descent, that may involve backtracking, that is makingrepeated scans of the input.

Backtracking is not very efficient, rarely needed to parse programming language.

For example, consider the grammarS c A d, A a b | a and the input string w= cad.

To construct a parse tree for this string top-down, to create a tree consisting of a single node labeled S.

The leftmost leaf, labeled c, matches the first symbol ofw.

The advance input pointer to a, the second symbol of w, the next leaf labeled A, match for thesecond input symbol.

The advance input pointer to d, the third input symbol and compared against the next leaf, labeled b.

b does not match d, then go back to A, to reset the input pointer to position 2, which means that theprocedure forA must store the input pointer in a local variable.

The second alternative forA to obtain the tree. The leafa matches the second symbol ofwand theleafd matches the third symbol.

It produced a parse tree forw, and successful completion of parsing.

A left-recursive grammar can cause a recursive-descent parser, even more with backtracking, to gointo an infinite loop.

Predictive Parsers

To eliminate left recursion from a grammar, and left factoring the resulting grammar.

To obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking i.e.,

a predictive parser. To construct a predictive parser, given the current input symbol a and the non-terminal A to be

expanded, which one of the alternatives of production A 1 | 2 | | n is the unique


11/20

Transition diagrams for Predictive Parsers

To create a transition diagram for the predictive parser, it is very useful plan or flowchart for lexicalanalyzer.

The labels of edges are tokens and non--terminals.

A transition on a token means that transition if that token is the next input symbol.

To construct the transition diagram of a predictive parser from a grammar, first eliminate left

recursion from the grammar, and then left factor the grammar.a) Create an initial and final state.

b) For each production A X1 X2 Xn , create a path from the initial to the final state, withedges labeled X1 , X2 , , Xn .

More than one transition from a state on the same input occurs ambiguity.

To build a recursive-descent parser using backtracking to systematically.

For example, E E + T | T , T T * F | F , F ( E ) | idcontains a collection of transition diagrams for grammar

E TE , E +TE | , T FT , T *FT | , F ( E ) | id.Substituting diagrams to the transformations on grammar can simplify transition diagrams.


12/20

E : T E T : F TE : + T E T : * F TF : ( E )

id

Fig. Transition diagram for grammar

TE : + T +E :

T +

E : T + E : T

Fig. Simplified transition diagramsE: + T: *

T F F : ( E )

id

Fig. Simplified transition diagrams for arithmetic expressions

0 1 27

8 9

3 4 65

1

0

1

1

1

3

1

2

1

4

1

5

1

7

1

6

3 4

6

5 3 4

6

0 3

6

4 0 3

6

0 3 6 7 81

3

1

4

1

5

1

7

1

6

Non-recursive predictive parsing

To build a non-recursive parser by maintaining a stack. The predictive parsing is determining theproduction to be applied for a non-terminal.

To non-recursive parser looks up the production to be applied in a parsing table.

The table can be constructed directly from grammars.

A table-driven predictive parser has an input buffer, a stack, a parsing table, and an output stream.

The input buffer contains the string to be parsed, followed by $, a symbol used as a right end markerto indicate the end of the input string.

The stack contains the start symbol of the grammar on top of$.

The parsing table is a two-dimensional array M [ A , a ], where A is a non-terminal, and a is a

terminal or the symbol $. X the symbol on the top of the stack, and athe current input symbol.


13/20

Stack XPredictive Parsing

Program Outpu

Y

Z

$

Parsing

Tab

leM

Fig. Non-recursivePredictive Parser

Three possibilities of two symbols :

1. If X = a = $, the parser halts and announces successful completion of parsing.

2. If X = a $, the parser pops Xoff the stack and advances the input pointer to the next inputsymbol.3. If X is a nonterminal, the program consults entry M [ X , a ] of the parsing table M. Thisentry will be either an X-production of the grammar or an error entry.

For example, ifM [ X , a ] = { X U V W }, the parser replaces X on top of the stack byWVU. If M [ X , a ] = error, the parser calls an error recovery routine.

Algorithm : Non-recursive predictive parsing.

Input : A string wand a parsing table M for grammarG.Output : Ifwis in L(G), a leftmost derivation ofw; otherwise, an error indication.Method : The parser is in a configuration in which it has $S on the stack with S, the start symbol of

G on top, and w$ in the input buffer. The program that utilizes the predictive parsingtable M to produce a parse for the input.

set ip to point to the first symbol ofw$;repeat

let X be the top of the stack symbol and a symbol pointed to by ip.ifX is a terminal or$then

if X = a then

pop X from the stack and advance ip else /* X is a non-terminal */

if M [ X , a ] = X Y1 Y2 Ykthen beginpop X from the stack;push Yk , Yk-1 , Y1 onto the stack, with Y1 on top;

output the production X Y1 Y2 Ykend

else error()until X = $ /* Stack is empty */

Fig. Predictive parsing algorithm

For example,

E E + T | T , T T * F | F , F ( E ) | idE TE , E +TE | , T FT , T *FT | , F ( E ) | id. The input id + id * id $ the predictive parser create the sequence of moves.

The input pointer points to the leftmost symbol of the string in the INPUT column. The leftmost derivation for the input, the productions output are those of a leftmost derivation

Th i b l d l d f ll d b h b l h k ( f


14/20

$ E T F id + id * id $ T F T$ E T id id + id * id $ F id$ E T + id * id $$ E + id * id $ T $ E T + + id * id $ E + T E$ E T F id * id $ T F T$ E T id id * id $$ E T * id $$ E T F * * id $ T * F T$ E T F id $$ E T id id $ F id$ E T $$ E $ T $ $ E

Fig. Predictive Parser on input id + id * id

FIRST and FOLLOW

The construction of a predictive parser is aided by the functions associated with a grammarG.i) Firstii) Follow

Sets of tokens yielded by the FOLLOW functions can also be used as synchronizing tokens duringpanic-mode error recovery.

FIRST( ) be the set of terminals, where, is any string of grammar symbols, strings derivedfrom . If * , then is in FIRST( ).

FOLLOW(A), be the non-terminal A, the set of terminals a that can appear immediately to the rightofA. The set of terminals a such that there exists a derivation of the form

S * A a for and .Note : During the derivation, have been symbols between A and a, they derived and disappeared. IfAcan be the rightmost symbol in sentential form, then $ is in FOLLOW(A).

Rules :

FIRST(X) for all grammar symbols X, apply the rules until no more terminals or can be added to anyFIRST set.

i) X is a terminal, then FIRST( terminal ) will be terminal itself X. { X }, (or) if X is aterminal, then FIRST(X) is { X }.

ii) X a is terminal, a is terminal, X ,(or) ifX is a production, then add toFIRST(X).

iii) if X is non-terminal and X Y1 Y2 Yk is a production, then place a in FIRST(X) if forsome i, a is in FIRST(Yi), and is in all ofFIRST(Y1), FIRST(Yi-1); that is, Y1 Yi-1* . If is in FIRST(Yj) for allj = 1, 2, , k, then added to FIRST(X).

Rules :

FOLLOW(A) for all non-terminals A, apply the rules until nothing can be added to any FOLLOW set.

i) $ in FOLLOW(S), where S is the start symbol and $ is the input right end marker.

ii) If there is a production A B , then everything in FIRST( ) except for is placed inFOLLOW(B).

iii) If there is a production A B, or a production A B where FIRST( ) contains(i * ) h hi i FOLLOW(A) i i FOLLOW(B)


15/20

FIRST(E) = { ( , id }

FIRST(E ) = { +, }FIRST(T) = { ( , id }

FIRST(T ) = { *, }FIRST(F) = { ( , id }

FOLLOW(E) = { $, ) }

FOLLOW(E ) = { $, ) }FOLLOW(T) = { +, $, ) }

FOLLOW(T ) = { +, $, ) }FOLLOW(F) = { *, +, $, ) }

CONSTRUCTION OF PREDICTIVE PARSING TABLES

i) A is a production with a in FIRST( ).ii) = or * , A the current input symbol is in FOLLOW(A), or if the $ on

the input has been reached and $ is in FOLLOW(A).

Algorithm : Construction of a predictive parsing table.Input : GrammarG.Output : Parsing table M.

Method : For each production A of the grammar, do steps 2 and 3.For each terminal a in FIRST( ), add A to M [ A , a ].If is in FIRST( ), add A to M [ A , b ] for each terminal b in FOLLOW(A).If is in FIRST( ) and $ is in FOLLOW(A), add A to M [ A, $ ].Make each undefined entry ofM be error.

For above example,

NON-

TERMINAL

INPUT SYMBOL

+ * ( ) Id $

E E T E E TE

E E + TE E E

T T F T T FT

T T T * FT T T

F F ( E ) F idLL(1) Grammars

A grammar whose parsing table has no multiply-defined entries is said to be LL(1).

L in LL(1) stands for scanning the input from left to right.

L in LL(1) stands for producing a left most derivation, and 1 in LL(1) stands for using one input symbol of look ahead at each step to make parsing

action decisions.

Properties

No ambiguous or left recursive grammar can be LL(1).

If A | are two different productions ofG.i) For no terminal a do both and derive strings beginning with a.ii) At most one of and can derive the empty string.iii) If * , then does not derive any string beginning with a terminal in

FOLLOW(A).

Disadvantages


16/20

There are no universal rules by which multiply-defined entries can be made single-valuedwithout affecting the language recognized by the parser.

Error recovery in Predictive parsing

An error is detected during predictive parsing when the terminal on top of the stack does notmatch the next input symbol or when non-terminal A is on top of the stack, a is the next inputsymbol, and the parsing table entry M [A , a ] is empty.

Panic-mode error recovery is skipping symbols on the input until a token in a selected set ofsynchronizing tokens appears. Its effectiveness depends on the choice of synchronizing set.

The parser recovers from errors to occur

As a starting point, all symbols in FOLLOW(A) into the synchronizing set for non-terminalA. If skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it islikely that parsing can continue.

It is not enough to use FOLLOW(A) as the synchronizing set forA.

If symbols in FIRST(A) to the synchronizing set for non-terminals A, then it may bepossible to resume parsing according to A if a symbol in FIRST(A) appears in the input.

If a non-terminal can generate the empty string then the production deriving can be used asa default. To reduces the number of non-terminals that have to be considered during errorrecovery.

If a terminal on top of the stack cannot be matched, the terminal was inserted and continueparsing.

Phrase-level recovery

To implement by filling in the blank entries in the predictive parsing table with pointers toerror routines.

Example :

Using FIRST and FOLLOW symbols as synchronizing tokens works, when expressions are parsedto grammarE E + T | T , T T * F | F , F ( E ) | id .

Construct the parsing table for this grammar, with synch indicating synchronizing tokens obtainedfrom the FOLLOW set of the non-terminal.

NON-TERMINALINPUT SYMBOL

+ * ( ) id $

E E T E synch E TE synch

EE + T

E E E T synch T F T synch T F

T synchT T T * F

T T T F synch synch F ( E ) synch F id Synch

Fig. Synchronizing tokens added to parsing table

STACK INPUT OUTPUT

$ E ) id * + id $ Error, skip )

id is in FIRST(E)$ E id * + id $

$ E T id * + id $


17/20

Error, M[F,+]=synch, F has been popped$ E T + id $$ E + id $$ E T + + id $$ E T F id $$ E T id id $$ E T id $$ E T $$ E $$ $

Bottom-up parsing

Bottom-up syntax analysis is known as shift-reduce parsing. It is easy to implement.

The general method of shift-reduce parsing, called LRparsing.

LRparsing is used in a number of automatic parser generators.

To construct a parse tree for an input string beginning at the leaves and working up towardsthe root.

At each reduction step a particular sub string matching the right side of a production isreplaced by the symbol on the left of that production, and if the sub string is chosen correctlyat each step, a right most derivation is traced out is reverse.

Example :

Consider the grammar,

S a A B eA A b e | bB dThe sentence abbcde can be reduced to S

abbcde

aAbcde ( A b )aAde ( A A b e )aABe ( B d )S ( S a A B e )

The sequence of four reductions, trace out the right most derivation in reverse :

S rm a A B e rm a A d e rm a A b c e rm a b b c eHANDLES

A Handle of a string is a substring that matches the right side of a production, and whose reductionto the non-terminal on the left side of the production represents one step along the reverse of arightmost derivations.

A handle of a right-sentential form is a production A and a position of where thestring may be found and replaced by A to produce the previous right-sentential form in arightmost derivation of .

If a grammar is unambiguous, then every right-sentential form of the grammar has exactly onehandle.

The handle represents the left most complete sub tree consisting of a node and all its children.

Consider the grammar, E E + E | E * E | ( E ) | id. The right most derivationE rm E + EE rm E + E * EE rm E + E * id3E rm E + id2 * id3E rmid1 + id2 * id3

( Or)

E rm E * EE rm E * id3E rm E + E * id3E rm E + id2 * id3

E rmid1 + id2 * id3Note : The string appearing to right of a handle contains only terminal symbols.


18/20

S = 0rm 1 rm 2 rm . rm n-1rm n = w. To construct this derivation in reverse order, the handle n in n and replace n by the left side of

the production An n to obtain the (n-1)st right-sentential form n-1. Again repeat this process, the handle n-1 in n-1 and reduce this handle to obtain the right-

sentential form n-2. If by continuing this process, produce a right sentential form consisting only of the start symbol S,

then halt and get successful completion of parsing.

The reverse of the sequence of the productions used in the reductions is a rightmost derivation forthe input string.

For example, the grammarE E + E | E * E | ( E ) | idRIGHT-SENTENTIAL FORM HANDLE REDUCING PRODUCTION

id1 + id2 * id3 id1 E id

E + id2 * id3 id2 E id

E + E * id3 id3 E id

E + E * E E * E E E * E

E + E E + E E E + E

EFig. Reductions marked by Shift-reduce parser

STACK IMPLEMENTATION OF SHIFT-REDUCE PARSING

There are two problems must be solved to parse by handle pruning.i) To locate the sub string to be reduced in a right-sentential form.ii) To determine, more than one production with that sub string on the right side.

To implement a shift-reduce parser is to use a stack to hold grammar symbols and an input buffer tohold the string to be parsed.

Initially, the stack is empty, and the string is on the input.STACK INPUT

$ w$

where , wis the string.

The parser operates by shifting zero or more input symbols onto the stack until a handle is on topof the stack.

The parser then reduces to the left side of the appropriate production. The parser repeats this cycle until it has detected and error or until the stack contains the start symbol

and the input is empty.STACK INPUT

S$ $ After entering this configuration, the parser halts and get successful completion of parsing.

Example:

A shift reduce parser is parsing the input string id1 + id2 * id3 according to the grammar

E E + E | E * E | ( E ) | id.STACK INPUT ACTION

$ id1 + id2 * id3$ Shift

$ id1 + id2 * id3$ Reduce by Eid

$ E + id2 * id3$ Shift

$ E + id2 * id3$ Shift

$ E + id2 * id3$ Reduce by Eid


19/20

$ E $ Accept

Primary operations of the parser are shift and reduce.

A shift-reduce parser can make the four possible actions :a) shift b) reduce c) accept and d) error

a) In a shift action, the next input symbol is shifted onto the top of the stack.b) In a reduce action, the parser knows the right side of the handle is at the top of the stack. It

must be locate the left end of the handle within the stack and decide with what non-terminalto replace the handle.c) In an accept action, the parser announces successful completion of parsing.d) In an error action, the parser discovers that a syntax error has occurred and calls an error

recovery routine.The important use of a stack in shift-reduce parsing:

The handle will always eventually appear on top of the stack, never inside.

S *rm A z *rm By z *rm y z , here, A *rm By , B *rmS *rm Bx A z *rm Bxy z *rm xy z , here, A *rmy , B *rm

In reverse of shift-reduce parser


20/20

1) STACK INPUT

$ y z $$ B y z $ ( B )$ By z $ ( Shift )$ A z $ ( A By )$ A z $ ( Shift )$ A z $ ( S A z )$ S $

2) STACK INPUT

$ xy z $$ B xy z $ ( B )$ Bx y z $ ( Shift )$ Bxy z $ ( Shift )$ BxA z $ ( A y )$ BxA z $ ( Shift )$ BxA z $ ( S BxA z )$ S $

After making a reduction the parser had to shift zero or more symbols to get the nexthandle onto the stack. It never had to go into the stack to find the handle.

Viable prefixes

The set of prefixes of right sentential forms that can appear on the stack of a shift-reduce parser are called viable prefix. (or)

It is a prefix of a right-sentential form that does not continue past the right end of theright most handle of that sentential form.

Conflicts during shift-reduce parsing

Context-free grammars for which shift-reduce parsing cannot be used.

Every shift-reduce parser for such a grammar can reach a configuration in whichthe parser, knowing the entire stack contents and the next input symbol, cannotdecide whether to shift or to reduce (a shift/reduce conflict), or cannot decidewhich of several reductions to make (a reduce/reduce conflict).

LR(k) class of grammars, the kin LR(k) refers to the number of symbols of look

ahead on the input. Grammars used in compiling fall in the LR(1) class, with one symbol look ahead.

Non-LR-ness occurs, but the stack contents and the next input symbol are notsufficient to determine which production should be used in a production.

role of parse1

Documents