automating grammar comparison ravichandhran madhavan, epfl joint work with mikael mayër, epfl sumit...

33
Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Upload: silas-french

Post on 18-Jan-2016

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Automating Grammar

ComparisonRavichandhran Madhavan, EPFL

Joint work withMikael Mayër, EPFLSumit Gulwani, MSRViktor Kuncak, EPFL

Page 2: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Overview

unit -> pkg? imp* type*field -> method | vard | …arglist -> exp (, exp)*…

Context-free grammars

Proven Equivalent

Counter-examplescu -> pd? impd* classd*pd -> annot* package qname;impd -> import static? qname (. '*')? ;…

final protected interface Id { } ;public synchronized enum Id { , }...

Verifier for Grammar Equivalence

Page 3: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Applications

Programming Languages

Compare grammars used by parsers

Unravel incompatibilities / bugs

Catch errors in rewriting

CS Education

Automate grading of grammar exercises

Aid in tutoring grammars

http://grammar.epfl.ch

Page 4: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Comparing PL Grammars

Does it accept all syntactically correct programs ?

Does it accept only syntactically correct programs ?

Javascript Grammar

Page 5: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Reality CheckAnalyzed grammars of 5 programming languages

Grammars disagree on > 40% of words sampled at random

Oracle JLS vs Antlr Javaprivate private public public class ID

implements char { }

interface ID { ; short ID = ~ + CharLiteral /= Null -- % IntLiteral == this <<= FloatLiteral ; }

Overly Permissive

Page 6: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Incompatibilities

Mozilla JS Grammar vs ECMA Script standard

eval("var k = function() {

}");

Reference Error

Parse Error

No Error

++ /a/ − this

Page 7: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Grammar Rewrite TasksGrammars have to be modified for many reasons

• Remove ambiguity

• Remove left recursion

• Eliminate shift / reduce conflicts (LR)

• Enable recursive descent parsing (LL)

Page 8: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Our Contribution

Finds hundreds of counter-examples between real world grammars

Detects incremental modifications

3x more effective and 12x faster than available state of the art

Fast and deep counter-example detection

Queries disproved

Speedup0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Our toolCFG Analyzer

Page 9: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Grammar Tutoring

Page 10: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Grammars from English DescriptionsConstruct an LL(1) grammar for function types over Int

Int  Int => Int

Int , Int => Int

Reference type → Int rest rest → => type | , Int args | εargs → , Int args | => type

Solution 1type → args => type |

Int args → Int , args | Int

Correct but not LL(1)

Solution 2S → Int UU → => Int U | , Int U

| εIncorrectbut LL(1)

Page 11: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Second Contribution

Tutoring system for Context-free grammars

Can find counter-examples as well as prove equivalence

Decided 95.3% of 1395 student queries

Free and open sourcehttp://grammar.epfl.ch

Students' Solutions0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Disproved ProvedUnknown

Page 12: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Disproving Equivalence

Page 13: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Counter-Example Discovery

unit -> pkg? imp* type*field -> method | vard | …arglist -> exp (, exp)*…

cu -> pd? impd* classd*pd -> annot* package qname;impd -> import static? qname (. '*')? ;…

Enumerate / Sample

ParseAntlr

CYK

LL(1)

Page 14: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Enumeration of Words & Parse Trees

We consider enumeration of parse trees

We define the function Enum(S, i) that returns the ith parse tree rooted at S

0 1 2 3 4 5 …

Page 15: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Random Lookup - Illustration

T → P Q | b P P → a | Q PQ → b | P Q

T, 100

Q,5

P, 4

P Q, 50

P Q, 4

P, 1

Q,1

Choose step 1

Unpair step

Q P, 3

Q,2

P, 0

a

Choose step 2

bb

…bb

…a b

…b

Page 16: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Advanced Features

Parametrized by word lengthEnumerates only parse trees of words of specified length

Uniform random sampling of parse treesSupports sampling uniformly from parse trees

of words of given length

Fair EnumerationFair usage of productions and non-terminals

Page 17: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Proving Equivalence

Page 18: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Theoretical Algorithms for Equivalence

Deterministic GrammarsG. Snizergues

Sub-deterministic grammarsValiant

Harrison & HavelLL(k)Olshansky & Pnueli

Rosenkrantz & Stearns

LL(1)Korenjak

& Hopcroft

Page 19: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Our Algorithm

Combines Korenjak & HopcroftHarrison & HavelOlshansky & Pnueli

More General Applies to arbitrary, ambiguous grammars

Complete for LL(2) grammarsPractical

Applied to grading pedagogical exercises

Page 20: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Algorithm Illustration

Two grammars for

Goal: To find a derivation for

S → a T T → a T U | bU → b

P → a R R → a b b | a R b | b

Ambiguous

Page 21: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Proving Equivalence

S → a T T → a T U | bU → b

P → a R R → a b b | a R b | b

True True True True True

Page 22: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Proving Equivalence

S → a T T → a T U | bU → b

P → a R R → a b b | a R b | b

True True True True True

Branch Rule

Page 23: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Proving Equivalence

S → a T T → a T U | bU → b

P → a R R → a b b | a R b | b

True True True True True

Equivalenceto

Inclusion

Page 24: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Proving Equivalence

S → a T T → a T U | bU → b

P → a R R → a b b | a R b | b

True True True True True

EnumerationRule

Page 25: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Proving Equivalence

S → a T T → a T U | bU → b

P → a R R → a b b | a R b | b

True True True True True

Split Rule

T U⇒

b

R b⇒

b

Page 26: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Proving Equivalence

S → a T T → a T U | bU → b

P → a R R → a b b | a R b | b

True True True True True

InductiveReasoning

Page 27: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Empirical Evaluation

Page 28: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Benchmarks

Compared two grammars per benchmark

Average Size: 213 non-terminals & 420 productions

• Java 7• JavaScript• C 11• Pascal• VHDL

Page 29: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Injecting Fine grained Errors

Type 1: Removing a production at random

Type 2: Removing a nonterminal of a production at random

Type 3: Disabling a production in a specific context at random

Page 30: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Detecting Fine grained Errors

Mean time / error foundOur tool: 28sCFG analyzer: 343s

Mean counter-example length

Our tool: 35CFG analyzer: 12.2

Type 1

Type 2

Type 3

0

20

40

60

80

100

Errors Discovered

Our toolCFG Analyzer

Page 31: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Tutoring System Evaluations

Evaluated the system on a class of 50 studentsThe tutoring system has 5 types of exercises & 60 problems

Queries

Refuted

Proved

Unknown

Time / query

1395 104274.6%

28920.7%

644.6%

107ms

Page 32: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Equivalence Prover Evaluations

Invocations

Proved

Time /

query

LL(2)querie

s

Ambiguous queries

353 28981.9%

410ms 6318%

10128.6%

Page 33: Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Conclusion

Tool support for testing compiler grammars

Discovering counter-examples between large programming language grammars

Automating tutoring and evaluation of grammars

Prove or disprove students’ solutions and provide feedbackMore useful in the context of MOOCs

http://grammar.epfl.ch