grammars & parsing. parser construction most of the work involved in constructing a parser is...

25
GRAMMARS & PARSING

Upload: samson-parker

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

Grammar 1 1. SENTENCE -> NOUNPHRASE VERB NOUNPHRASE 2. NOUNPHRASE -> the ADJECTIVE NOUN 3. NOUNPHRASE -> the NOUN 4. VERB -> pushed 5. VERB -> helped 6. ADJECTIVE -> pretty 7. ADJECTIVE -> poor 8. NOUN -> man 9. NOUN -> boy 10. NOUN -> cat

TRANSCRIPT

Page 1: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

GRAMMARS & PARSING

Page 2: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Parser ConstructionMost of the work involved in constructing a parser is carried out automatically by a program, referred to as a compiler-compiler program, such as Yacc.

To create a parser for a computer language, one constructs a description of the language, which is then employed as input to the compiler-compiler.

This description of the language is in the form of a grammar for the language

Page 3: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Grammar 1

1.  SENTENCE -> NOUNPHRASE VERB NOUNPHRASE2.  NOUNPHRASE -> the ADJECTIVE NOUN3.  NOUNPHRASE -> the NOUN4.  VERB -> pushed5.  VERB -> helped6.  ADJECTIVE -> pretty7.  ADJECTIVE -> poor8.  NOUN -> man9.  NOUN -> boy10. NOUN -> cat

Page 4: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Grammar 1 is an example of a context-free grammar (the only kind we will deal with). The grammar consist of 10 productions e.g. production 3 is nounphrase -> the nounHere “nounphrase” is referred to as the lefthand side (lhs) and “the noun” is referred to as the righthand side (rhs)

Page 5: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

The set of all lhs’s constitutes the set of nonterminals of the grammar.

In this case they are: {SENTENCE, NOUNPHRASE,VERB, ADJECTIVE, NOUN}

All the other symbols occurring in the grammar (i.e. in some rhs, but never as any lhs) are the terminals of the grammar.

In this case {the,pushed,helped,pretty,poor,…}

Page 6: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

The lhs of the first production is called the goal symbol, in this case “sentence”.

A derivation of a string in the grammar is a list of strings starting with the goal symbol, in which each string, except the first, is obtained from the preceding one by applying a substitution of one of its symbols using one of the productions as a substitution rule

Page 7: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

A string which has a derivation is said to be derivable.

Derivable strings that consist entirely of terminal symbols are called sentences of the grammar. E.g. the man helped the poor boyis a sentence of Grammar 1.

The set of all sentences of a grammar is called the language defined by the grammar

Page 8: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Grammar 1 (Cont.1)

Derivation of the sentence:  "the man helped the poor boy“

1.      SENTENCE                       (goal symbol)  2. ==>  NOUNPHRASE VERB NOUNPHRASE     (by Rule 1) 3. ==>  the NOUN VERB NOUNPHRASE       (Rule 3)4. ==>  the man VERB NOUNPHRASE        (Rule 8)5. ==>  the man helped NOUNPHRASE6. ==>  the man helped the ADJECTIVE NOUN7. ==>  the man helped the poor NOUN8. ==>  the man helped the poor boy

(this derivation shows that "the man helped the poor boy“ is a sentence in the language defined by the grammar.)

Page 9: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Grammar 1 (Cont.2)

This derivation may also be represented diagrammatically by a syntax tree:

       

Page 10: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Typical format of a grammar for a programming language

PROGRAM -> PROGRAM STATEMENTPROGRAM -> STATEMENTSTATEMENT -> ASSIGNMENT-STATEMENTSTATEMENT -> IF-STATEMENTSTATEMENT -> DO-STATEMENT...ASSIGNMENT-STATEMENT -> ......IF-STATEMENT -> ......DO-STATEMENT -> ......

Page 11: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Grammar 2

A simple grammar for arithmetic statements

1.    E -> E + T

2.    E -> T

3.    T -> T * a

4.    T -> a

Page 12: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Grammar 2 (Cont.1)

Derivation of:  a + a * a

1.        E Goal Symbol2. ==>    E + T Rule 13. ==>    E + T * a Rule 34. ==>    E + a * a Rule 45. ==>    T + a * a Rule 26. ==>    a + a * a Rule 4

Page 13: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Grammar 2 (Cont.2)

Derivation of: a + a * a written in reverse:

1.   a + a * a Given sentential form2.   T + a * a Rule 4 in reverse3.   E + a * a Rule 2 in reverse4.   E + T * a Rule 45.   E + T Rule 3 in reverse6.   E Rule 1

Page 14: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Syntax Analysis -1

• One of the functions of syntax analysis (parsing) is to verify whether the source program is syntactically correct

  • The parser obtains a string of tokens from the scanner

and verifies that the string can be generated by the grammar for the source language.  (If not, the source program is not syntactically correct.)

Page 15: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Syntax Analysis -2• There are many approaches to parsing, including

both top-down and bottom-up approaches.  • The LR parsing method discussed uses a bottom-

up approach (generating the derivation in reverse)

• It constructs a rightmost derivation, in which at

any step a production can only be applied to the rightmost nonterminal in the string involved.

Page 16: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

LR Parsers -1• An LR Parser uses an input buffer of the remaining source code to

be read, a symbol stack, a state no. stack, and an action table to determine what action to take when the next symbol is read from the input buffer.

• The parser reads tokens (a i) of the programming language grammar from the input buffer one at a time.

• Using the combination of the state no. on top of the state stack and the current input symbol, ai, in the input buffer, the parser consults the action table to determine whether it should perform a transition or reduce action (as defined on the next slide)

Page 17: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

LR Parsers -2• When the action determined by the action table entry is make a

transition to state s, the parser pushes the current input symbol ai onto the symbol stack and the new state no. s onto the state stack.

• s is now the new top of the state no. stack, and

• ai is the top of the symbol stack, and ai+1 is the new next input symbol.  The remaining input to be processed is:  ai+1 . . . an -|) where the symbol -| represents the end of the source

file  

Page 18: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

LR Parsers -3If the action determined by the action table entry is reduce

using production A -> b, the parser performs the following sequence of actions:

• Pops r symbols from the symbol stack and r states from the state stack.

• Consults the action table to decide which state sr to make a transition to with respect to the nonterminal symbol A (the left hand side of the reduce production) as a result of the reduction, and the current state no. now at the top of the state-no stack.

Page 19: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

LR Parsers -4• The nonterminal symbol A is pushed onto the

symbol stack (replacing the symbols of the right hand side of the production that were previously on the symbol stack), and the new state sr after the reduction is pushed onto the state stack (replacing the states the were removed from the state stack).

  • The remaining input remains unchanged (in

contrast to transition actions).

• Some of the reduce actions cause associated code to be generated.

Page 20: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Example

The following diagram graphically represents an action table generated for the grammar:

E -> E + T | T T -> T * a | a

It is called a parsing machine

Page 21: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a
Page 22: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Using the machine to parse   a+a*aStep Number Stack Contents Remaining Input

1Symbol:  emptyState:  0 a + a * a -|

2Symbol:   State:   0 4  + a * a -|

3Symbol: aState:  0 3 + a * a -|

4Symbol: TState: 0 1 + a * a -|

5Symbol: E State: 0 1 2 a * a -|

6Symbol: E + State: 0 1 2 4 * a -|

7Symbol: E + aState: 0 1 2 6 * a -|

8Symbol: E + T State: 0 1 2 6 5 a -|

9Sym : E + T * St: 0 1 2 6 5 7Sym: E + T * a

-|10 State: 0 1 2 6 -|11

Symbol: E + TState: 0 1 E -|

12 ACCEPT -|

Page 23: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Exercise

Supply a rightmost top-down derivation for  a + a * a.

Page 24: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Exercise (Cont.1)

Now, look at the derivation given in the parseexample in the last second slide.  First, crossout all the state numbers and also the end marker, then rewrite that step of the parse in this form:e.g.,       Stack status:      Remaining inputStep 7      0 E 1 + 2 T 6             + a -|is rewritten as:

            E + T               + aand concatenated together becomes:            E + T + a

Page 25: GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a

Exercise (Cont.2)Do this to every step of the derivation, then cross out any duplicates of a given step.

Now compare the result with the top-down derivation you obtained above.  How are these two sets of results related to each other?

Note that the parse provides a "bottom-up" derivation --- which contains the same steps as the top-down derivation but in the reverse order.