cpsc 388 – compiler design and construction parsers – context free grammars

35
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Upload: annis-bradford

Post on 11-Jan-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

CPSC 388 – Compiler Design and Construction

Parsers – Context Free Grammars

Page 2: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Announcements HW3 due via Sakai

Solution to HW3

Homework: HW4 assigned today, due next Friday, via Sakai

Reading: 4.1-4.4

Progress Report Grades due Sept 28th

Page 3: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Compilers

Lexical Analyzer(Scanner)

Syntax Analyzer(Parser)

Symantic Analyzer

Intermediate CodeGenerator

Optimizer

Code Generator

SourceCode

TokenStream

AbstractSyntaxTree

Page 4: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Scanner

Source Code:

position = initial + rate * 60 ;

Corresponding Tokens:

IDENT(position)ASSIGNIDENT(position)PLUSIDENT(rate)TIMESINT-LIT(60)SEMI-COLON

Page 5: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Example ParseSource Code:

position = initial + rate * 60 ;

Abstract-Syntax Tree: =

position +

initial *

rate 60

•Interior nodes are operators. •A node's children are operands. •Each subtree forms "logical unit"

e.g., the subtree with * at its root shows thatbecause multiplication has higher precedencethan addition, this operation must be performedas a unit (not initial+rate).

Page 6: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Limitations of RE and FSA

Regular Expressions and Finite State Automata cannot express all languages

For Example the language that consists of all balanced parenthesis: () and ((())) and (((((())))))

Parsers can recognize more types of languages than RE or FSA

Page 7: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Parsers Input: Sequence of Tokens

Output: a representation of program Often AST, but could be other things

Also find syntax errors

CFGs used to define Parser(Context Free Grammar)

Page 8: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Context Free Grammars

if ( expr ) stmt else stmtstmt →

terminals non-terminalsRule or Production

Page 9: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Context Free Grammar

CFGs consist of:

Σ – Set of Terminals (use tokens from scanner)

N – Set of Non-Terminals (variables)

P – Set of Productions (also called rules)

S – the Start Non-Terminal (one on left of first rule if not specified)

Page 10: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Productions

if ( expr ) stmt else stmtstmt →

Single non-terminal

Sequence of zero or more terminals and non-terminals

Page 11: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Example with Boolean Expressions

“true” and “false” are boolean expressions

If exp1 and exp2 are boolean expressions, then so are: exp1 || exp2

exp1 && exp2

! exp1

( exp1 )

Page 12: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Corresponding CFG

bexp → TRUEbexp → FALSEbexp → bexp OR bexpbexp → bexp AND bexpbexp → NOT bexpbexp → LPAREN bexp RPAREN

UPPERCASE represent tokens (thus terminals)lowercase represent non-terminals

Page 13: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

CFG for Assignments

Here is CFG for simple assignment statements(Can only assign boolean expressions to identifiers)

stmt → ID ASSIGN bexp SEMICOLON

Page 14: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

CFG for simple IF statements

Combine these CFGs and add 2 more rules for simple IF statements:1. stmt → IF LPAREN bexp RPAREN stmt2. stmt → IF LPAREN bexp RPAREN stmt ELSE stmt3. stmt → ID ASSIGN bexp SEMICOLON4. bexp → TRUE5. bexp → FALSE6. bexp → bexp OR bexp7. bexp → bexp AND bexp8. bexp → NOT bexp9. bexp → LPAREN bexp RPAREN

Page 15: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

You Try It

Write a context-free grammar for the language of very simple while loops (in which the loop body only contains one statement) by adding a new production with nonterminal stmt on the left-hand side.

Page 16: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

CFG Languages The language defined by a context-free

grammar is the set of strings (sequences of terminals) that can be derived from the start nonterminal.

Think of productions as rewriting rulesSet cur_seq = starting non-terminalWhile (non-terminal, X, exists in cur_seq):

Select production with X on left of “→”Replace X with right portion of selected

production Try it with given CFG

Page 17: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

What Strings are in Language

1. stmt → IF LPAREN bexp RPAREN stmt2. stmt → IF LPAREN bexp RPAREN stmt ELSE stmt3. stmt → ID ASSIGN bexp SEMICOLON4. bexp → TRUE5. bexp → FALSE6. bexp → bexp OR bexp7. bexp → bexp AND bexp8. bexp → NOT bexp9. bexp → LPAREN bexp RPAREN

Set cur_seq = starting non-terminalWhile (non-terminal, X, exists in cur_seq):

Select production with X on left of “→”Replace X with right portion of selected production

Page 18: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Try It Again

exp → exp PLUS termexp → exp MINUS termexp → termterm → term TIMES factorterm → term DIVIDE factorterm → factorfactor → LPAREN exp RPARENfactor → ID

What is the language?

Page 19: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Leftmost and Rightmost Derivations

A derivation is a leftmost derivation if it is always the leftmost nonterminal that is chosen to be replaced.

It is a rightmost derivation if it is always the rightmost one.

Page 20: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Derivation Notation

E => a E =>* a E =>+ a

Page 21: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Parse Trees

Start with the start nonterminal.

Repeat: choose a leaf nonterminal X

choose a production X --> alpha

the symbols in alpha become the children of X in the tree

until there are no more leaf nonterminals left.

The derived string is formed by reading the leaf nodes from left to right.

Page 22: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Ambiguous Grammars Consider the grammar

exp → exp PLUS expexp → exp MINUS expexp → exp TIMES expexp → exp DIVIDE expexp → INT_LIT

Construct Parse tree for 3-4/2 Are there more than one parse trees? If there is more than one parse tree for a string then

the grammar is ambiguous Ambiguity causes problems with parsing (what is the

correct structure)?

Page 23: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Precedence in Grammars

To write a grammar whose parse trees express precedence correctly, use a different nonterminal for each precedence level. Start by writing a rule for the operator(s) with the lowest precedence ("-" in our case), then write a rule for the operator(s) with the next lowest precedence, etc:

Page 24: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Precedence in Grammars

exp → exp MINUS exp | term

term → term DIVIDE term | factor

factor → INT_LIT | LPAREN exp RPAREN

Now try constructing multiple parse trees for 3-4/2

Grammar is still ambiguous. Look at associativity. Construct 2 parses tree for 5-3-2.

Page 25: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Recursion on CFGs A grammar is recursive in nonterminal X

if: X derives a sequence of symbols that includes an X.

A grammar is left recursive in X if: X derives a sequence of symbols that starts with an X.

A grammar is right recursive in X if: X derives a sequence of symbols that ends with an X.

For left associativity, use left recursion. For right associativity, use right

recursion.

Page 26: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Ambiguity Removed in CFG

exp → exp MINUS term | term

term → term DIVIDE factor | factor

factor → INT_LIT | LPAREN exp RPAREN

One level for each order of operation Left recursion for left assiciativity Now try constructing 2 parse trees for

5-3-2.

Page 27: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

You Try it Construct a grammar for arithmetic

expressions with addition, multiplication, exponentiation (right assoc.), subtraction, division, and unary negative.exp → exp PLUS exp |

exp MINUS exp |exp TIMES exp |exp DIVIDE exp |exp POW exp |MINUS exp |LPAREN exp RPAREN |INT_LIT

Page 28: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Solutionexp → exp PLUS term |

exp MINUS term |term

term → term TIMES factor |term DIVIDE factor |factor

factor → exponent POW factor |exponent

exponent → MINUS exponent |final

final → INT_LIT |LPAREN exp RPAREN

Page 29: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

You Try It Write a grammar for the language of boolean

expressions, with two possible operands: true false, and three possible operators: and or not. Add nonterminals so that or has lowest precedence, then and, then not. Finally, change the grammar to reflect the fact that both and and or are left associative.bexp → TRUEbexp → FALSEbexp → bexp OR bexpbexp → bexp AND bexpbexp → NOT bexpbexp → LPAREN bexp RPAREN

Page 30: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

List Grammars Several types of lists can be created using

CFGs One or more x's (without any separator or

terminator):1. xList → X | xList xList2. xList → X | xList X3. xList → X | X xList

One or more x's, separated by commas:1. xList → X | xList COMMA xList2. xList → X | xList COMMA X3. xList → X | X COMMA xList

Page 31: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

List Grammars One or more x's, each x terminated by a

semi-colon: You Try It1. xList → X SEMICOLON | xList xList2. xList → X SEMICOLON | xList X SEMICOLON3. xList → X SEMICOLON | X SEMICOLON xList

Zero or more x's (without any separator or terminator):

1. xList → ε | X | xList xList2. xList → ε | X | xList X3. xList → ε | X | X xList

Page 32: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

List Grammars Zero or more x's, each terminated by a

semi-colon:1. xList → ε | X SEMICOLON | xList xList

2. xList → ε | X SEMICOLON | xList X SEMICOLON

3. xList → ε | X SEMICOLON | X SEMICOLON xList Zero or more x's, separated by commas:

You Try It

1. xList → ε | nonEmptyXListnonEmptyXList → X | X COMMA nonEmptyXList

Page 33: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

CFGs for Whole Languages

To write a grammar for a whole programming language, break down the problem into pieces. For example, think about a Java program: a program consists of one or more classes:

program → classlist

classlist → class | class classlist

Page 34: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

CFGs for Whole Languages

A class is the word "class", optionally preceded by the word "public", followed by an identifier, followed by an open curly brace, followed by the class body, followed by a closing curly brace:

class → PUBLIC CLASS ID LCURLY classbody RCURLY |

CLASS ID LCURLY classbody RCURLY

Page 35: CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

CFGs for Whole Languages

A class body is a list of zero or more field and/or method definitions: classbody → ε | deflist

deflist → def | def deflist

And So On…