talk at virginia bioinformatics institute, december 5, 2013

49
Extensible domain-specific programming for the sciences Eric Van Wyk University of Minnesota VBI, December 5, 2013 slides available at http:www.cs.umn.edu/ ~ evw 1 / 45

Upload: ericupnorth

Post on 06-May-2015

4.274 views

Category:

Technology


0 download

DESCRIPTION

Extensible domain-specific programming for the sciences The notion of scientists as programmers begs the question of what sort of programming language would be a good fit. The common answer seems to be both none of them and all of them. Many scientific applications are a combination of general-purpose and domain-specific languages: R for statistical elements, MATLAB for matrix-based computations, Perl-based regular expressions for string matching, C or FORTRAN for high performance parallel computations, and scripting languages such as Python to glue them all together. This clumsy situation demonstrates the need for different domain-specific language features. Our hypothesis is that programming could be made easier, less error-prone and result in higher-quality code if languages could be easily extended, by the programmer, with the domain-specific features that a programmer or scientists needs for their particular task at hand. This talk demonstrates the meta-language processing tools that support this composition of programmer-selected language features, with several extensions chosen from the previously mentioned list of features.

TRANSCRIPT

Page 1: talk at Virginia Bioinformatics Institute, December 5, 2013

Extensible domain-specific programming

for the sciences

Eric Van Wyk

University of Minnesota

VBI December 5 2013

slides available at httpwwwcsumnedu~evw

1 45

Current trends topics in PL

Formal verification

I CompCert - httpcompcertinriafr

I Astree - httpwwwastreeensfr

I Hoare logic (1960rsquos)

P code Q

I Proof assistants Coq Abella Isabelle use required in some PL publishing venues

2 45

3 45

4 45

Current trends topics in PL

Parallel programming - multiple cores everywhere

I ldquono more free lunchrdquo

I needI new abstractions eg Cilk MapReduce FPI new semantics eg deterministic parallel Java

5 45

Current trends topics in PL

Expressive and safe static typing

I extending richer static types eg

append ( [a] [a] ) -gt [a]

I to dependent types

append ( [a|n] [a|m] ) -gt [a|n+m]

I turns array out-of-bounds and null-pointer bugs intostatic type errors

6 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Why would anyone want to do that

7 45

Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records

Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)

Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache

8 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Pick a general purpose host language (eg ANSI C)extend with domain-specific features

myProgramxc =rArr myProgramc

9 45

Regular expressions

include stdioh

include regexh

int main (int argc char argv [])

char text = readFileContents(Xdata)

eukaryotic messenger RNA sequences

regex foo = ^ATG[ATGC ]3 10A5 10$

if ( text =~ foo )

printf (Matches n)

else

printf (Doesnrsquot match n)

10 45

Mining Climate Data - Ocean Eddies

I Spinning pools of water

I Transport heat salt andnutrients

I Learning about theirbehavior is difficult

11 45

A time slice for a point in the ocean

12 45

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 2: talk at Virginia Bioinformatics Institute, December 5, 2013

Current trends topics in PL

Formal verification

I CompCert - httpcompcertinriafr

I Astree - httpwwwastreeensfr

I Hoare logic (1960rsquos)

P code Q

I Proof assistants Coq Abella Isabelle use required in some PL publishing venues

2 45

3 45

4 45

Current trends topics in PL

Parallel programming - multiple cores everywhere

I ldquono more free lunchrdquo

I needI new abstractions eg Cilk MapReduce FPI new semantics eg deterministic parallel Java

5 45

Current trends topics in PL

Expressive and safe static typing

I extending richer static types eg

append ( [a] [a] ) -gt [a]

I to dependent types

append ( [a|n] [a|m] ) -gt [a|n+m]

I turns array out-of-bounds and null-pointer bugs intostatic type errors

6 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Why would anyone want to do that

7 45

Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records

Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)

Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache

8 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Pick a general purpose host language (eg ANSI C)extend with domain-specific features

myProgramxc =rArr myProgramc

9 45

Regular expressions

include stdioh

include regexh

int main (int argc char argv [])

char text = readFileContents(Xdata)

eukaryotic messenger RNA sequences

regex foo = ^ATG[ATGC ]3 10A5 10$

if ( text =~ foo )

printf (Matches n)

else

printf (Doesnrsquot match n)

10 45

Mining Climate Data - Ocean Eddies

I Spinning pools of water

I Transport heat salt andnutrients

I Learning about theirbehavior is difficult

11 45

A time slice for a point in the ocean

12 45

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 3: talk at Virginia Bioinformatics Institute, December 5, 2013

3 45

4 45

Current trends topics in PL

Parallel programming - multiple cores everywhere

I ldquono more free lunchrdquo

I needI new abstractions eg Cilk MapReduce FPI new semantics eg deterministic parallel Java

5 45

Current trends topics in PL

Expressive and safe static typing

I extending richer static types eg

append ( [a] [a] ) -gt [a]

I to dependent types

append ( [a|n] [a|m] ) -gt [a|n+m]

I turns array out-of-bounds and null-pointer bugs intostatic type errors

6 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Why would anyone want to do that

7 45

Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records

Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)

Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache

8 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Pick a general purpose host language (eg ANSI C)extend with domain-specific features

myProgramxc =rArr myProgramc

9 45

Regular expressions

include stdioh

include regexh

int main (int argc char argv [])

char text = readFileContents(Xdata)

eukaryotic messenger RNA sequences

regex foo = ^ATG[ATGC ]3 10A5 10$

if ( text =~ foo )

printf (Matches n)

else

printf (Doesnrsquot match n)

10 45

Mining Climate Data - Ocean Eddies

I Spinning pools of water

I Transport heat salt andnutrients

I Learning about theirbehavior is difficult

11 45

A time slice for a point in the ocean

12 45

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 4: talk at Virginia Bioinformatics Institute, December 5, 2013

4 45

Current trends topics in PL

Parallel programming - multiple cores everywhere

I ldquono more free lunchrdquo

I needI new abstractions eg Cilk MapReduce FPI new semantics eg deterministic parallel Java

5 45

Current trends topics in PL

Expressive and safe static typing

I extending richer static types eg

append ( [a] [a] ) -gt [a]

I to dependent types

append ( [a|n] [a|m] ) -gt [a|n+m]

I turns array out-of-bounds and null-pointer bugs intostatic type errors

6 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Why would anyone want to do that

7 45

Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records

Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)

Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache

8 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Pick a general purpose host language (eg ANSI C)extend with domain-specific features

myProgramxc =rArr myProgramc

9 45

Regular expressions

include stdioh

include regexh

int main (int argc char argv [])

char text = readFileContents(Xdata)

eukaryotic messenger RNA sequences

regex foo = ^ATG[ATGC ]3 10A5 10$

if ( text =~ foo )

printf (Matches n)

else

printf (Doesnrsquot match n)

10 45

Mining Climate Data - Ocean Eddies

I Spinning pools of water

I Transport heat salt andnutrients

I Learning about theirbehavior is difficult

11 45

A time slice for a point in the ocean

12 45

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 5: talk at Virginia Bioinformatics Institute, December 5, 2013

Current trends topics in PL

Parallel programming - multiple cores everywhere

I ldquono more free lunchrdquo

I needI new abstractions eg Cilk MapReduce FPI new semantics eg deterministic parallel Java

5 45

Current trends topics in PL

Expressive and safe static typing

I extending richer static types eg

append ( [a] [a] ) -gt [a]

I to dependent types

append ( [a|n] [a|m] ) -gt [a|n+m]

I turns array out-of-bounds and null-pointer bugs intostatic type errors

6 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Why would anyone want to do that

7 45

Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records

Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)

Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache

8 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Pick a general purpose host language (eg ANSI C)extend with domain-specific features

myProgramxc =rArr myProgramc

9 45

Regular expressions

include stdioh

include regexh

int main (int argc char argv [])

char text = readFileContents(Xdata)

eukaryotic messenger RNA sequences

regex foo = ^ATG[ATGC ]3 10A5 10$

if ( text =~ foo )

printf (Matches n)

else

printf (Doesnrsquot match n)

10 45

Mining Climate Data - Ocean Eddies

I Spinning pools of water

I Transport heat salt andnutrients

I Learning about theirbehavior is difficult

11 45

A time slice for a point in the ocean

12 45

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 6: talk at Virginia Bioinformatics Institute, December 5, 2013

Current trends topics in PL

Expressive and safe static typing

I extending richer static types eg

append ( [a] [a] ) -gt [a]

I to dependent types

append ( [a|n] [a|m] ) -gt [a|n+m]

I turns array out-of-bounds and null-pointer bugs intostatic type errors

6 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Why would anyone want to do that

7 45

Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records

Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)

Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache

8 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Pick a general purpose host language (eg ANSI C)extend with domain-specific features

myProgramxc =rArr myProgramc

9 45

Regular expressions

include stdioh

include regexh

int main (int argc char argv [])

char text = readFileContents(Xdata)

eukaryotic messenger RNA sequences

regex foo = ^ATG[ATGC ]3 10A5 10$

if ( text =~ foo )

printf (Matches n)

else

printf (Doesnrsquot match n)

10 45

Mining Climate Data - Ocean Eddies

I Spinning pools of water

I Transport heat salt andnutrients

I Learning about theirbehavior is difficult

11 45

A time slice for a point in the ocean

12 45

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 7: talk at Virginia Bioinformatics Institute, December 5, 2013

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Why would anyone want to do that

7 45

Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records

Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)

Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache

8 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Pick a general purpose host language (eg ANSI C)extend with domain-specific features

myProgramxc =rArr myProgramc

9 45

Regular expressions

include stdioh

include regexh

int main (int argc char argv [])

char text = readFileContents(Xdata)

eukaryotic messenger RNA sequences

regex foo = ^ATG[ATGC ]3 10A5 10$

if ( text =~ foo )

printf (Matches n)

else

printf (Doesnrsquot match n)

10 45

Mining Climate Data - Ocean Eddies

I Spinning pools of water

I Transport heat salt andnutrients

I Learning about theirbehavior is difficult

11 45

A time slice for a point in the ocean

12 45

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 8: talk at Virginia Bioinformatics Institute, December 5, 2013

Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records

Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)

Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache

8 45

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Pick a general purpose host language (eg ANSI C)extend with domain-specific features

myProgramxc =rArr myProgramc

9 45

Regular expressions

include stdioh

include regexh

int main (int argc char argv [])

char text = readFileContents(Xdata)

eukaryotic messenger RNA sequences

regex foo = ^ATG[ATGC ]3 10A5 10$

if ( text =~ foo )

printf (Matches n)

else

printf (Doesnrsquot match n)

10 45

Mining Climate Data - Ocean Eddies

I Spinning pools of water

I Transport heat salt andnutrients

I Learning about theirbehavior is difficult

11 45

A time slice for a point in the ocean

12 45

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 9: talk at Virginia Bioinformatics Institute, December 5, 2013

Extensible languages

Allow programmers select the features to be used in theirprogramming languages

I new syntax notations

I new semantic analyses error-checking

Pick a general purpose host language (eg ANSI C)extend with domain-specific features

myProgramxc =rArr myProgramc

9 45

Regular expressions

include stdioh

include regexh

int main (int argc char argv [])

char text = readFileContents(Xdata)

eukaryotic messenger RNA sequences

regex foo = ^ATG[ATGC ]3 10A5 10$

if ( text =~ foo )

printf (Matches n)

else

printf (Doesnrsquot match n)

10 45

Mining Climate Data - Ocean Eddies

I Spinning pools of water

I Transport heat salt andnutrients

I Learning about theirbehavior is difficult

11 45

A time slice for a point in the ocean

12 45

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 10: talk at Virginia Bioinformatics Institute, December 5, 2013

Regular expressions

include stdioh

include regexh

int main (int argc char argv [])

char text = readFileContents(Xdata)

eukaryotic messenger RNA sequences

regex foo = ^ATG[ATGC ]3 10A5 10$

if ( text =~ foo )

printf (Matches n)

else

printf (Doesnrsquot match n)

10 45

Mining Climate Data - Ocean Eddies

I Spinning pools of water

I Transport heat salt andnutrients

I Learning about theirbehavior is difficult

11 45

A time slice for a point in the ocean

12 45

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 11: talk at Virginia Bioinformatics Institute, December 5, 2013

Mining Climate Data - Ocean Eddies

I Spinning pools of water

I Transport heat salt andnutrients

I Learning about theirbehavior is difficult

11 45

A time slice for a point in the ocean

12 45

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 12: talk at Virginia Bioinformatics Institute, December 5, 2013

A time slice for a point in the ocean

12 45

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 13: talk at Virginia Bioinformatics Institute, December 5, 2013

main (int argc char argv)

Matrix float lt3gt data

= readMatrix(sshdata)

Matrix float lt3gt scores

= matrixMap(scoreTS data [2])

writeMatrix(temporalScoresdata

scores)

13 45

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 14: talk at Virginia Bioinformatics Institute, December 5, 2013

Matrix float lt1gt scoreTS (Matrix float lt1gtts)

int i = 0 beginning n = dimSize(ts 0)

Matrix float lt1gt scores

= init(Matrix float lt1gt dimSize(ts 0))

while(ts[i] lt ts[i+1]) i = i+1

Matrix float [0] trough

while(i lt n-1)

(trough beginning i)

= getTrough(ts i)

scores[beginning i]

= computeArea(trough)

return scores

14 45

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 15: talk at Virginia Bioinformatics Institute, December 5, 2013

Matrix float lt1gt computeArea

(Matrix float lt1gt areaOfInterest)

float y1 = areaOfInterest [0]

float y2 = areaOfInterest[end]

int x1 = 0

int x2=dimSize(areaOfInterest 0) -1

float m = (y1-y2) ((float)(x1-x2))

float b = y1 - mx1

Matrix float lt1gt Line = (x1x2)m+b

float area

= with( x1 lt= i lt x2)

fold(+ 00 line - areaOfInterest)

return

with( 0 lt= i lt dimSize(Line 0) )

genarray ([ dimSize(Line 0)] area)

15 45

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 16: talk at Virginia Bioinformatics Institute, December 5, 2013

(Matrix float lt1gt int int) getTrough

(Matrix float lt1gt ts int i)

int beginning = i

int n = dimSize(ts 0)

while(i+1 lt n ampamp ts[i] gt= ts[i+1])

i = i+1

while(i+1 lt n ampamp ts[i] lt ts[i+1])

i = i+1

return (ts[beginning i] beginning i)

16 45

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 17: talk at Virginia Bioinformatics Institute, December 5, 2013

Matrix extensionsI several features from MATLAB

I with fold and genarray from Single Assignment C

I all translated down to expected C code

I straightforward parallel implementations of matrixMapwith fold and genarray

17 45

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 18: talk at Virginia Bioinformatics Institute, December 5, 2013

Dimension analysis

pound-seconds 6= newton-seconds18 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 19: talk at Virginia Bioinformatics Institute, December 5, 2013

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

19 45

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 20: talk at Virginia Bioinformatics Institute, December 5, 2013

include stdioh

int main (int argc char argv [])

int meter x = 34

int meter y = 56

int meter^2 area = x y

printf (dn x + y) OK

printf (dn x + z) Error

20 45

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 21: talk at Virginia Bioinformatics Institute, December 5, 2013

include stdioh

int main (int argc char argv [])

int x = 34

int y = 56

int area = x y

printf (dn x + y) OK

Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation

21 45

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 22: talk at Virginia Bioinformatics Institute, December 5, 2013

Extension composition

I Programmers can select the extensions that they want

I May want to use multiple extensions in the same program

I Distinguish between1 extension user

I has no knowledge of language design or implementations

2 extension developerI must know about language design and implementation

I Tools build a custom xc =rArr c translator for them

I How can that be done

22 45

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 23: talk at Virginia Bioinformatics Institute, December 5, 2013

Building translators from composable extensible

languages

Two primary challenges1 composable syntax mdash enables building a scanner parser

I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper

2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and

higher-order attributesI set union of specification components

I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute

I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver

23 45

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 24: talk at Virginia Bioinformatics Institute, December 5, 2013

Generating parsers and scanners from grammars

and regular expressions

nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]

Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo

Stmt = Stmt Semi StmtStmt = Id Eq Expr

Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id

24 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 25: talk at Virginia Bioinformatics Institute, December 5, 2013

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(z)

Semi Stmt

Id(a) Eq Expr

Id(b)

Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)

ldquox = y + 3 z a = brdquo25 45

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 26: talk at Virginia Bioinformatics Institute, December 5, 2013

Attribute Grammars

I add semantics mdash meaning mdash to context free grammars

I nodes (non-terminals) have attributesI that is semantic values

I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types

I Stmt may be attributed with errors and env

26 45

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 27: talk at Virginia Bioinformatics Institute, December 5, 2013

Stmt

Stmt

Id(x) Eq Expr

Expr

Id(y)

Plus Expr

Expr

Num(3)

Mult Expr

Id(y)

Semi Stmt

Id(x) Eq Expr

Id(z)

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]

type = int errors = [ ]

type = int errors = [ ]

errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]

env = [x7rarrint y 7rarrint z 7rarrstring]t=string

errors=[ERROR]

errors=[ERROR]

27 45

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 28: talk at Virginia Bioinformatics Institute, December 5, 2013

Attribute grammar specifications

Equations associated with productions define attribute values

a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr

e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s

e t y p e = i n t

l env = e env r env = e env

28 45

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 29: talk at Virginia Bioinformatics Institute, December 5, 2013

Modern attribute grammars

I higher-order attributes

I reference attributes

I collection attributes

I forwarding

I module systems

I separate compilation

I etc

29 45

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 30: talk at Virginia Bioinformatics Institute, December 5, 2013

for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr

body Stmt

s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r

f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )

w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body

a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )

30 45

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 31: talk at Virginia Bioinformatics Institute, December 5, 2013

Building an attribute grammar evaluator from composedspecifications

AGH cuplowast AGE1 AGEn

foralli isin [1 n]modComplete(AGH AGEi )

rArr rArr complete(AGH cup AGE1 AGE

n )

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [SLErsquo12a]

31 45

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 32: talk at Virginia Bioinformatics Institute, December 5, 2013

Challenges in scanning

Keywords in embedded languages may be identifiers in hostlanguage

int SELECT

rs = using c query SELECT last name

FROM person WHERE

32 45

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 33: talk at Virginia Bioinformatics Institute, December 5, 2013

Challenges in scanning

Different extensions use same keyword

connection c jdbcderbyderbydbtestdb

with table person [ person id INTEGER

first name VARCHAR ]

b = table ( c1 T F

c2 F )

33 45

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 34: talk at Virginia Bioinformatics Institute, December 5, 2013

Challenges in scanning

Operators with different precedence specifications

x = 3 + y z

str = [a-z][a-z0-9]java

34 45

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 35: talk at Virginia Bioinformatics Institute, December 5, 2013

Challenges in scanning

Terminals that are prefixes of others

ListltListltIntegergtgt dlist

x = y gtgt 4

35 45

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 36: talk at Virginia Bioinformatics Institute, December 5, 2013

Need for context

I Traditionally parser and scanner are disjoint

Scanner rarr Parser rarr Semantic Analysis

I In context aware scanning they communicate

Scanner Parser rarr Semantic Analysis

36 45

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 37: talk at Virginia Bioinformatics Institute, December 5, 2013

Context aware scanning

I Scanner recognizes only tokens valid for current ldquocontextrdquo

I keeps embedded sub-languages in a sense separate

I ConsiderI chan in out

for i in a a[i] = ii

I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]

submits to keyword

I terminal FOR rsquoforrsquo lexer class keyword

I example is part of AbleP [SPINrsquo11]

37 45

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 38: talk at Virginia Bioinformatics Institute, December 5, 2013

Parsing C as an extension to Promelac_decl

typedef struct Coord

int x y Coord

c_state Coord pt Global goes in state vector

int z = 3 standard global decl

active proctype example()

c_code nowptx = nowpty = 0

do c_expr nowptx == nowpty

-gt c_code nowpty++

else -gt break

od

c_code printf(values d d ddn

Pexample-gt_pid nowz nowptx nowpty)

assert(false) trigger an error trail

38 45

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 39: talk at Virginia Bioinformatics Institute, December 5, 2013

Context aware scanning

I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context

I It will return a shorter valid match before a longer invalidmatch

I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not

I A context aware scanner is essentially an implicitly-modedscanner

I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal

regexs

39 45

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 40: talk at Virginia Bioinformatics Institute, December 5, 2013

I With a smarter scanner LALR(1) is not so brittle

I We can build syntactically composable languageextensions

I Context aware scanning makes composable syntax ldquomorelikelyrdquo

I But it does not give a guarantee of composability

40 45

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 41: talk at Virginia Bioinformatics Institute, December 5, 2013

Building a parser from composed specifications

CFGH cuplowast CFGE1 CFGEn

foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )

rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)

I Monolithic analysis - not too hard but not too useful

I Modular analysis - harder but required [PLDIrsquo09]

I Non-commutative composition of restricted LALR(1)grammars

41 45

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 42: talk at Virginia Bioinformatics Institute, December 5, 2013

42 45

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 43: talk at Virginia Bioinformatics Institute, December 5, 2013

Expressiveness versus safe composition

Compare to

I other parser generators

I libraries

The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical

43 45

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 44: talk at Virginia Bioinformatics Institute, December 5, 2013

Future Work

I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of

Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]

I incorporate existing language extensions

I composition of language extensions are compile-time

I language specific analysis

I new applications of AGs

44 45

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 45: talk at Virginia Bioinformatics Institute, December 5, 2013

Thanks for your attention

Questions

httpmeltcsumnedu

evwcsumnedu

45 45

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 46: talk at Virginia Bioinformatics Institute, December 5, 2013

Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007

Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010

August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009

45 45

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 47: talk at Virginia Bioinformatics Institute, December 5, 2013

Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012

Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012

Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010

45 45

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 48: talk at Virginia Bioinformatics Institute, December 5, 2013

Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011

Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007

45 45

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45

Page 49: talk at Virginia Bioinformatics Institute, December 5, 2013

Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007

45 45