grammar variation in compiler design carl wu. three topics syntax grammar vs. ast component(?)-based...

25
Grammar Variation in Compiler Design Carl Wu

Upload: beverley-small

Post on 31-Dec-2015

256 views

Category:

Documents


1 download

TRANSCRIPT

Grammar Variation in Compiler Design

Carl Wu

Three topics

• Syntax Grammar vs. AST

• Component(?)-based grammar

• Aspect-oriented grammar

Grammar vs. AST (I)

How to automatically generate a tree from a grammar?

Grammar vs. AST (I)

Stmt ::= Block

| “if” Expr “then” Stmt

| IdUse “:=” Exp

Grammar vs. AST (I)

Stmt ::= Block | “if” Exp “then” Stmt | IdUse “:=” Exp

JastAdd Specification (Tree)abstract Stmt;BlockStmt : Stmt ::= Block;IfStmt : Stmt ::= Exp Stmt;AssignStmt : Stmt ::= IdUse Exp;

Grammar vs. AST (I)

Restricted CFG Definition

A ::= B C D √ => aggregation

A ::= B | C | D √ => inheritance

A ::= B C | D ×

Grammar vs. AST (I)

RCFG Specification

Stmt :: Block | IfStmt | AssignStmt

IfStmt :: “if” Exp “then” Stmt

AssignStmt :: IdUse “:=” Exp

AssignStmtBlockIfStmt

Stmt

Exp Stmt IdUse Exp

Grammar vs. AST (II)

Parse tree vs. IR tree

Grammar vs. AST (II)

• In an IDE, there are multiple visitors for the same source code (>12 !).

• Different requirement for the tree structure:– Syntax vs. semantics– Immutable vs. transformable (optimization)– Parse tree vs. IR tree

Grammar vs. AST (II)

• Generate two tree structures from the same grammar!

• One immutable, strong-typed, concrete parse tree – Read only!

• One transferable, untyped, abstract IR tree – Read and write!

Grammar vs. AST (II)IfStmt :: “if” Exp “then” Stmt

Class ASTNode{protected ASTNode[] children;

}class IfStmt extends ASTNode{

final protected Token token_if, Exp exp, Token token_then, Stmt stmt;IfStmt(Token token_if, Exp exp, Token token_then, Stmt stmt){

// parse tree construction this.token_if = token_if;this.exp = exp;this.token_then = token_then;this.stmt = stmt;// IR tree constructionchildren[0] = exp;children[1] = stmt;

}}

Component(?)-based grammar

Component vs. module

• What is the different between a component and a module?

• What is a modularized grammar?

• What is an ideal component-based grammar?

Component vs. module

Grammar Component

Grammar Component

Grammar Component

Grammar Component

ParserParser

ParserParser

Grammar Module

Grammar Module

Grammar Module

Grammar Module

GrammarGrammar

ParserParser

Modularized grammar

Component-based grammar

Benefits

• Benefits from modularized grammar– Easy to read, write, change– Eliminate naming conflicts

• Additional benefits brought from component-based grammar– Each component can be designed, developed and

tested individually. – Any change to certain component does not require

compiling all the other components.– Different type of grammars/parsing algorithms can be

used for different component, e.g., one component can be LL, one can be LALR.

Difficulty in designing component-based grammar

• No clear guards between two components. – Switch the control to a new parser or stay in the

same?– Suitable for embed languages, e.g., Jscript in Html– Not suitable for an integral language, e.g., Java

• Two much coupling between two components. – Not just reuse the component as a whole, may also

reuse the internal productions and symbols.– Not applicable for LR parsers, once the table is built,

you can’t reuse the internal productions (no way to jump into a table).

Ideal vs. reality

JavaClass

Interface

Object_type

Statement

Expression

Type

Binary_expr

Unary_expr

Primary

Array

JavaClass

Interface

Object_type

Statement

Expression

Type

Binary_expr

Unary_expr

Primary

Array

Suggestions?

Aspect-oriented grammar

Aspect-oriented grammar

• Join-point: grammar patterns that crosscut multiple productions

• Punctuations, identifiers, modifiers…

Example

• ";“ appears 25 times in one of the Java grammars

• “.” appears 74 times in one of the Cobol grammars

• Every one of them should be carefully placed!

<Sentence> ::= <Accept Stm> '.' | <Add Stm> '.' | <Add Stm Ex> <End-Add Opt> '.' | <Call Stm> '.' | <Call Stm Ex> <End-Call Opt> '.' | <Close Stm> '.' | <Compute Stm> '.' | <Compute Stm Ex> <End-Compute Opt>

'.' | <Display Stm> '.' | <Divide Stm> '.' | <Divide Stm Ex> <End-Divide Opt> '.' | <Evaluate Stm> <End-Evaluate Opt> '.' | <If Stm> <End-If Opt>'.' | <Move Stm> '.' | <Move Stm Ex> <End-Move Opt> '.' | <Multiply Stm>'.' | <Multiply Stm Ex> <End-Multiply Opt> '.'

| <Open Stm> '.' | <Perform Stm> '.' | <Perform Stm Ex> <End-Perform Opt>

'.' | <Read Stm> '.' | <Read Stm Ex> <End-Read Opt> '.' | <Release Stm> '.' | <Rewrite Stm> '.' | <Rewrite Stm Ex> <End-Rewrite Opt> '.' | <Set Stm> '.' | <Start Stm> '.' | <Start Stm Ex> <End-Start Opt> '.' | <String Stm> '.' | <String Stm Ex> <End-String Opt> '.' | <Subtract Stm>'.' | <Subtract Stm Ex> <End-Substract Opt>

'.' | <Write Stm> '.' | <Write Stm Ex> <End-Write Opt> '.' | <Unstring Stm>'.' | <Unstring Stm Ex> <End-Unstring Opt> '.' | <Misc Stm> '.'

pointcut PreDot(): <Sentence>;

after PreDot(): ‘.'

Another example

pointcut Content(): … …

before Content(): “(”;

after Content(): “)”;

Guarantee they match!

Grammar weaving

Base GrammarBase Grammar

Grammar AspectGrammar Aspect

Result grammarResult grammar

ParserParser

What do you think?