2014 | sem - vii | syntax analysis 170701 compiler · pdf filethe lexical analyzer is the...

Dixita Kagathara

2014 | Sem - VII | Syntax Analysis

170701 – Compiler Design

1) How do the parser and scanner communicate? Explain with the block

diagram communication between them.

The lexical analyzer is the first phase of a compiler

Its main task is to read the input characters and produce as output a sequence of tokens

that the parser uses for syntax analysis.

The block diagram of communication between scanner and parser is given below

Upon receiving a “get next token” command from parser, the lexical analyzer reads the

input characters until it can identify the next token.

2) Explain parse tree, syntax tree and DAG.

Parse Tree Syntax Tree

interior nodes are non-terminals,

leaves are terminals

interior nodes are “operators”, leaves

are operands

Rarely constructed as a data structure when representing a program in a tree

structure usually use a syntax tree

Represents the concrete syntax of a

program

Represents the abstract syntax of a

program (the semantics)

Dixita Kagathara

Syntax Analysis 170701 – Compiler Design

Difference between DAG and syntax tree.

A DAG (Directed Acyclic Graph) gives the same information like syntax tree but in a more

compact way because common sub-expressions are identified. syntax tree and DAG for the

assignment statement x=-a*b + -a*b is given below

3) Explain types of derivation and Ambiguity

Types of derivation:

1. Leftmost derivation

2. Rightmost derivation

Let Consider the grammar with the Production SS+S | S-S | S*S | S/S |(S)| a

Left Most Derivation Right Most Derivation

A derivation of a string W in a grammar G is a

Left most derivation if at every step the left

most nonterminal is replaced

A derivation of a string W in a grammar G is a

Right most derivation if at every step the right

most nonterminal is replaced

Consider string a*a-a

SS-S

S*S-S

a*S-S

a*a-S

a*a-a

Consider string: a-a/a

SS-S

S-S/S

S-S/a

S-a/a

a-a/a

Assign

+

* *

Uminus b Uminus

b

a a

X

Syntax tree

Assign

+ X

*

Uminus

b

a

DAG

Dixita Kagathara


Equivalent left most derivation tree

Equivalent Right most derivation tree

An Ambiguous Grammar :

A grammar G is ambiguous if there is at least one string in L(G) having two or more distinct

derivation tree.(or, equivalently two or more distinct leftmost derivation or rightmost derivation)

1) Prove that given grammar is ambiguous SS+S / S-S / S*S / S/S /(S)/a (IMP)

String : a+a+a

SS+S SS+S

a+S S+S+S

a+S+S a+S+S

a+a+S a+a+S

a+a+a a+a+a

We have two left most derivation for string a+a+a hence, proved that above grammar is

ambiguous.

2) Prove that S->a | Sa | bSS | SSb | SbS is ambiguous

String: baaab

SbSS SSSb

baS bSSSb

baSSb baSSb

baaSb baaSb

baaab baaab

We have two left most derivation for string baaab hence, proved that above grammar is

S

S

S

S

+

a

S +

a

a

S

S

S

S +

a

S +

a

a

S

S

S

S

-

a

S /

a

a S

S S

S

-

a

S *

a

a

Dixita Kagathara


ambiguous.

4) Left recursion algorithm

A grammar is left-recursive if it has a non terminal A such that there is a derivation A-

>Aα for some string α

Algorithm

1. Assign an ordering A1,…,An to the nonterminals of the grammar.

2. for i:=1 to n do begin

for j:=1 to i−1 do begin

replace each production of the form Ai→Ajɣ

by the productions Ai ->δ1ɣ | δ2ɣ |…..| δkɣ

where Aj -> δ1 | δ2 |…..| δk are all the current Aj productions;

end

end

eliminate the intermediate left recursion among the Ai-productions

end

5) Left factoring algorithm

Input. Grammar G

Output. An equivalent left factored grammar.

method. For each non terminal A find the longest prefix α common to two or more of its

alternatives. If α!= E, i.e., there is a non trivial common prefix, replace all the A productins

A==> α β1| α β2|..............| α βn| ɣ where ɣ represents all alternatives that do not begin

with α by

A==> α A'| ɣ

A'==>β1|β2|.............|βn

Here A' is new nonterminal. Repeatedly apply this transformation until no two alternatives for a

non-terminal have a common prefix.

EX:

A-> xByA | xByAzA | a

B->b

Left factored, the grammar becomes

A-> xByAA’ | a

A’->zA | Є

B-> b

6) Explain parsing and types of parsing

Parsing or syntactic analysis is the process of analyzing a string of symbols according to

the rules of a formal grammar

http://en.wikipedia.org/wiki/Formal_grammar

Dixita Kagathara


Parsing is a technique that takes input string and produces output either a parse tree if

string is valid sentence of grammar, or an error message indicating that string is not a

valid sentence of given grammar.

There are mainly two types of parsing

1. Top down parsing: A top down parser for a given grammar G tries to derive a

string through a sequence of derivation starting with a start symbol.

2. Bottom up parsing: In bottom up parsing, the source string is reduced to the

start symbol of the grammar. Bottom up parsing method is also called shift

reduce parsing.

Top down parsing

7) Recursive decent parsing

“A top down parser that executes a set of recursive procedures to process the input without

backtracking is called recursive-decent parser, and parsing is called recursive decent parsing”

Ex:

S->E

E->VE’

E’->+VE’ | Є

V->Id

Recursive decent method given below for above grammar

S()

{

E();

}

E()

{

V();

E’();

}

E’()

{

If(next symbol==’+’)

{

V();

E’();

}

}

V()

{

If(next symbol==’Id’)

Dixita Kagathara


{

Return;

}

Else

{

Print(“error”);

}

}

8) Non Recursive Predictive parser LL(1)

For LL(1)- the first L means the input is scanned from left to right. The second L means it

uses leftmost derivation for input string. And the number 1 in the input symbol means it

uses only one input symbol (lookahead) to predict the parsing process.

The simple block diagram for LL(1) parser is as given below

Input Token

parsing table

The data structures used by LL(1) are input buffer, stack and parsing table.

The LL(1) parser uses input buffer to store the input tokens. The stack is used to hold the

left sentinel form. The symbols in R.H.S. of rules are pushed into the stack in reverse

order.

Thus use of stack makes this algorithm non-recursive. The table is basically a two

dimensional array.

The table has row for non-terminal and column for terminals. The table can be

represented as M[A,a] where A is a non terminal and a is a current input symbol.

Step to construct LL(1) parser :

LL(1) Parser

Stack Output

Dixita Kagathara


1. Remove left recursion or perform left factoring on grammar if required.

2. Computation of FIRST and FOLLOW function.

3. Construct the predictive parsing table.

4. Parse the input string with the help of predictive parsing table.

Rules to find FIRST functions :

1. If the terminal symbol a the FIRST(a)={a}.

2. If there is a rule X→ε then FIRST(X)={ ε }.

3. For the rule A→ Y1,Y2…Yk where Y1 is NT

1. If Y1 does not derives ε then FIRST(A)=FIRST(Y1)

2. If Y1 derives ε then FIRST(A)=FIRST(Y1) – ε U FIRST(Y2)

Rules to find FOLLOW functions :

1. For the start symbol S place $ in FOLLOW(S).

2. If there is a production A→ α B β and β is terminal then add β to FOLLOW(B)

3. If there is a production A→ α B β and β is non-terminal then

1. If β does not derives ε then FOLLOW(B) = FIRST(β)

2. If β derives ε then FOLLOW(B) = FIRST(β) - ε U FOLLOW(α)

9) Examples of LL(1) parsing

1) EE+T/T

TT*F/F

F(E)/id String- id+id*id

FIRST FOLLOW

E {(,id} {$,)}

E’ {+,ϵ} {$,)}

T {(,id} {+,$,)}

T’ {*,ϵ} {+,$,)}

F {(,id} {*,+,$,)}

Predictive Parsing Table :-

id + * ( ) $/

E ETE’ ETE’

E’ E’+TE’ E’ϵ E’ϵ

T TFT' TFT'

T’ T’ϵ T’*FT’ T’ϵ T’ϵ

F Fid F(E)

Dixita Kagathara


2) S(L)/a

LSL’

L’,SL’/ϵ String :- (a,(a,a))

FIRST FOLLOW

S {(,a} {,,),$}

L {(,a} {)}

L’ {,,ϵ} {)}


( ) a , $/

S S(L) Sa

L LSL’ LSL’

L’ L’ϵ L’,SL’

3) SabSa/aaAb/b

AbaAb/b

FIRST FOLLOW

S {a,b} {$,a}

S’ {a,b} {$,a}

A {b} {b}

A’ {a,ϵ} {b}


a b $/

S SaS’ Sb

S’ S’aAb S’bAa

A AbA’

A’ A’aAb A’ϵ

Bottom up parsing

10) Explain Handle and handle pruning

Handle: A “handle” of a string:

– Is a substring of the string

– Matches the right side of a production

– Reduction to left side of production is one step along reverse of rightmost

derivation

Handle pruning: the process of discovering a handle and reducing it to appropriate LHS NT is

known as handle pruning.

Dixita Kagathara


11) Shift reduce parsing

Shift reduce parser attempts to construct parse tree from leaves to root.

Thus it works on the same principle of bottom up parser.

A shift reduce parser requires following data structures

1) Input buffer 2) Stack

The parser performs following basic operation

Shift: Construct leftmost handle on top of stack

Reduce: Identify handle and replace by corresponding RHS

Accept: Continue until string is reduced to start symbol and input token stream is empty

Error: Signal parse error if no handle is found.

Ex: consider the grammar E->E-E | E*E | id perform shift reduce parsing for string id-

id*id

Stack Input buffer Action

$ Id-id*id$ Shift

$id -id*id$ Reduce E->id

$E -id*id$ shift

$E- id*id$ shift

$E- id *id$ Reduce E->id

$E-E *id$ shift

$E-E* id$ shift

$E-E*id $ Reduce E->id

$E-E*E $ Reduce E->E*E

$E-E $ Reduce E->E-E

$E $ Accept

12) Operator precedence parsing

Operator Precedence Grammar :-

It is a grammar in which there is no ϵ on right hand side of production & does not have

adjacent Nonterminal.

Operator precedence parsing is based on bottom-up parsing techniques and uses a

precedence table to determine the next action.

The table is easy to construct and is typically hand-coded.

This method is ideal for applications that require a parser for expressions and where

embedding compiler technology.

Disadvantages

1. It cannot handle the unary minus (the lexical analyzer should handle the unary

minus).

2. Small class of grammars.

3. Difficult to decide which language is recognized by the grammar.

Dixita Kagathara


Advantages

1. simple

2. powerful enough for expressions in programming languages

Leading:-

Leading of a Nonterminal is the first terminal or operator in production of that

Nonterminal.

Traling:-

Traling of a Nonterminal is the last terminal or operator in production of that

Nonterminal

Example:

EE+T/T

TT*F/F

Fid

Step-1: Find leading and traling of NT.

Leading Traling

(E)={+,*,id} (E)={+,*,id}

(T)={*,id} (T)={*,id}

(F)={id} (F)={id}

Step-2:Establish Relation

1. a <. b

Op . NT Op <. Leading(NT)

+T + <. {*,id}

*F * <. {id}

2. a .>b

NT . Op Traling(NT) .> Op

E+ {+,*, id} .> +

T* {*, id} .> *

3. $ <. {+, *,id}

4. {+,*,id} .> $

Step-3: Creation of table

+ * id $

+ .> <. <. .>

* .> .> <. .>

id .> .> .>

$ <. <. <.

We will follow following steps to parse the given string:

1. Scan the input string until first .> is encountered

2. Scan backward until <. is encountered

3. The handle is string between <. and .>

Dixita Kagathara


|- <. Id .> + <. Id .> * <. Id .> -| Handle id is obtained between <. .>

Reduce this by E->id

E+ <. Id .> * <. Id .> -| Handle id is obtained between <. .>


E+ E * <. Id .> -| Handle id is obtained between <. .>


E+E*E Remove all non terminal

+* Insert |- and -|

|- +* -| Place relation between operators

|- <. +<. * >-| The * operator is surrounded by <. .>.this

indicates * becomes handle we have to reduce

E*E

|- <. + >-| + becomes handle. Hence reduce E+E

|- -| Parsing Done

Making Operator Precedence Relations

The operator precedence parsers usually do not store the precedence

table with the relations; rather they are implemented in a special way.

Operator precedence parsers use precedence functions that map terminal

symbols to integers, and so the precedence relations between the symbols

are implemented by numerical comparison.

Algorithm for Constructing Precedence Functions

1. Create functions fa for each grammar terminal a and for the end of string symbol.

2. Partition the symbols in groups so that fa and gb are in the same group if a =· b

(there can be symbols in the same group even if they are not connected by this

relation).

3. Create a directed graph whose nodes are in the groups, next for each symbols a

and b do:

place an edge from the group of gb to the group of fa if a <· b, otherwise

if a ·> b place an edge from the group of fa to that of gb.

4. If the constructed graph has a cycle then no precedence functions exist. When

there are no cycles collect the length of the longest paths from the groups of fa

and gb respectively.

Example: consider the following table

id + * $

id ·> ·> ·>

+ <· ·> <· ·>

* <· ·> ·> ·>

$ <· <· <· ·>

Dixita Kagathara


Using the algorithm leads to the following graph:

From which we extract the following precedence functions:

id + * $

f 4 2 4 0

g 5 1 3 0

13) Explain the following terms.

1. Augmented grammar: if grammar G having start symbol S then augmented grammar is

the new grammar G’ in which S’ is a new start symbol such that S’ -> .S

2. Kernel items: it is a collection of items S’->.S and all the items whose dots are not at the

left most end of the RHS of the rule.

3. Non-Kernel items: it is a collection of items in which dots are at the left most end of the

RHS of the rule.

4. Viable prefix: it is a set of prefix in right sentential form of the production A-> α, this set

can appear on the stack during shift reduce action.

14) Explain Simple LR parser. OR

Explain working of simple LR parser with the help of an example.

SLR means simple LR. A grammar for which an SLR parser can be constructed is said to

be an SLR grammar.

SLR is a type of LR parser with small parse tables and a relatively simple parser generator

algorithm. It is quite efficient at finding the single correct bottom up parse in a single left

to right scan over the input string, without guesswork or backtracking.

For this type first we have to find LR (0) items of a grammar G is a production of G with a

Dixita Kagathara


dot at some position of the right side.

Production: A→XY

A→.XY

A→X.Y

A→XY.

The first item above indicates that we hope to see a string derivable from XY next on the

input. The second item indicates that we have just seen on the input a string derivable

from XY next on the input.

The parsing table has two states (action, Go to).

The parsing table has four values:

– Shift S, where S is a state.

– reduce by a grammar production

– accept, and

– error

Example is as follows :

E→ E + T | T

T→ TF | F

F→ F * | a | b

Step 1: augmented grammar: E’ → .E

Step 2: Closure

I0 : E’ → .E

E → .E + T

E → .T

T → .TF

T → .F

F → .F *

F → .a

F → .b

Step 3: go to action

I1 : Go to ( I0,E ) : E’ → E.

E → E.+T

I2 : Go to ( I0, T ): E → T.

T→ T.F

F→.F *

F→.a

F→.b

I3 : Go to ( I0,F ) : T→ F.

F→F.*

I4 : Go to ( I0,a ) : F→a.

Dixita Kagathara


I5 : Go to (I0,b) : F→ b.

I6 : Go to ( I1,+ ) : E→ E+.T

T→.TF

T→ .F

F→.F*

F→ .F*

F→.a

F→.b

I7 : Go to (I2,F ) : T→TF.

F→F.*

I8 : Go to ( I3,* ) : F→F *.

I9 : Go to( I6,T ) : E→ E + T.

T→T.F

F→.F *

F→.a

F→.b

Follow:

Follow ( E ) : {+,$}

Follow ( T ) :{+,a,b,$}

Follow ( F ) : {+,*,a,b,$}

SLR parser table :

Action Go to

state + * a b $ E T

0 S4 S5 1 2 3

1 S6 accept

2 R2 S4 S5 R2 7

3 R4 S8 R4 R4 R4

4 R6 R6 R6 R6 R6

5 R6 R6 R6 R6 R6

6 S4 S5 9 3

7 R3 S8 R3 R3 R3

8 R5 R5 R5 R5 R5

9 R1 S4 S5 R1 7

Exercise:

1. S → A a A b

S → B b B a

A → ∈

Dixita Kagathara


B → ∈

2. S → xAy | xBy | xAz

A → aS | b

B → b

3. S → 0 S 0 | 1 S 1

S → 1 0

15) CLR (canonical LR)

Example : S→ C C

C→ a C | d

Step1: augmented grammar: S’ → .S, $

I0 : Step2: closure: S’→ .S, $

S→ .CC, $

C→ .a C , a | d

C→ .d , a | d

Step3: Go to action:

I1 : Go to(I0,S) : S’→S. , $

I2 : Go to(I0,C) : S→ C.C, $

C→ .a C , $

C→ .d, $

I3 : Go to ( I0,a ) : C→ a.C, a | d

C→ .a C, a | d

C → .d , a | d

I4 : Go to ( I0,d ) : C→ d. , a | d

I5 : Go to ( I2,C ) : S→ C C., $

I6 : Go to ( I2,a ) : C→ a.C , $

C→ .a C , $

C→ .d , $

I7 : Go to ( I2,d ) : C→ d. ,$

I8 : Go to ( I3,C ) : C→ a C. , a | d

I9 : Go to ( I6,C ) : C→ a C. ,$

Parsing table:

Action Go to

state a d $ S C

0 S3 S4 1 2

1 Accept

2 S6 S7 5

3 S3 S4 8

4 R3 R3

5 R1

Dixita Kagathara


I9 : Go to ( I6,c ) : C→ a C. ,$

now we will merge state 3,6 then 4,7 and 8,9.

I36 : C→ a.C , a | d | $

C→ .a C , a | d | $

C→ .d , a | d | $

I47 : C→ d. , a | d | $

I89: C→ aC. ,a | d | $

Parsing table:

Action Go to

state a d $ S C

0 S36 S47 1 2

1 Accept

2 S36 S47 5

36 S36 S47 8 9

47 R3 R3 R3

5 R1

89 R2 R2 R2

2014 | sem - vii | syntax analysis 170701 compiler · pdf filethe lexical analyzer is the...

Documents