applicative logic meta-programming as the foundation for template-based program transformation

Applicative Logic Meta-Programming as the foundation for

Template-based Program Transformation

Coen De [email protected] Engineering Laboratory Software Languages Lab

Osaka Brussels

mailto:[email protected]

Introducing Belgium: location and languages

Dutch6.23 million

French3.32 million

German0.07 million

Germany

LuxemburgFrance

Netherlands

Introducing Belgium: Brussels

Jeanneke pis

Zinneke pisAtomium

Grote Markt - Grand PlaceManneken pis

Introducing Belgium: painters

Magritte (1964)

Bruegel (1563)

Rubens (1615) Ensor (1889)

van Eyck (1434) van der Weyden (1445)

Permeke (1929)

Introducing Belgium: food

Waffles (from Brussels) Waffles (from Liège)

Chocolates French Fries

Introducing Belgium: drinks

For 25 little-known facts: http://cheeseweb.eu/2009/08/25-belgium/

http://cheeseweb.eu/2009/08/25-belgium/

Applicative Logic Meta-Programming as the foundation for

Template-based Program Transformation

Coen De [email protected] Engineering Laboratory Software Languages Lab

Osaka Brussels

mailto:[email protected]

Motivation: repeating changes within a program

annotation for properties of which evolution is to be tracked

more typesafe version

before

after

manypublic class BreakStatement extends Statement { @EntityProperty(value = SimpleName.class) private EntityIdentifier label;! public EntityIdentifier getLabel() { return label; }! public void setLabel(EntityIdentifier label) { this.label = label; }}

public class BreakStatement extends Statement { @EntityProperty(value = SimpleName.class) private EntityIdentifier<SimpleName> label;! public EntityIdentifier<SimpleName> getLabel() { return label; }! public void setLabel(EntityIdentifier<SimpleName> label) { this.label = label; }}

Table IISTUDY SUBJECTS

Eclipse JDT core Eclipse SWT Mozilla ProjectType IDE IDE Several projects associated with InternetPeriod of development 2001/06 ⇠ 2009/02 2001/05 ⇠ 2010/05 1998/03 ⇠ 2008/05Study period 2004/07 ⇠ 2006/07 2004/07 ⇠ 2006/07 2003/04 ⇠ 2005/07Total revisions 17000 revisions 21530 revisions 200000 revisions# of bugs 1812 1256 11254# of Type I bugs 1405 (77.54%) 954 (75.96%) 7562 (67.19%)# of Type II bugs 407 (22.46%) 302 (24.04%) 3692 (32.81%)

0%10%20%30%40%50%60%70%80%

2 3 4 5 6 7

Perc

enta

ge

Eclipse JDT core

0%

10%

20%

30%

40%

50%

60%

2 3 4 5 6 7 8 9 10 11 12

Perc

enta

ge

Eclipse SWT

0%

10%

20%

30%

40%

50%

60%

70%

2 4 6 8 10 12 14 16 18 21 25 29

Percen

tage

Mozilla

Figure 1. The number of times that the same bug is fixed

0%

10%

20%

30%

40%

50%

60%

70%

0 3 6 9 12 16 19 25 29 38 44 52 60 80 97 103

130

166

495

Perc

enta

ge

Eclipse JDT core

0%

10%

20%

30%

40%

50%

60%

70%

0 4 8 16 22 27 35 47 58 73 84 119

172

265

319

428

957

Perc

enta

ge

Eclipse SWT

0%5%10%15%20%25%30%35%

0 31 63 94 127

161

196

231

278

336

398

464

531

585

659

733

804

887

1014

Percen

tage

Mozilla

Figure 2. The number of days taken for the supplementary fix to appear since an initial fix

Table IIISEVERITY OF TYPE I AND TYPE II BUGS

Eclipse JDT core Eclipse SWT MozillaType I bugs Type II bugs Type I bugs Type II bugs Type I bugs Type II bugs

Blocker 1.07% 1.47% 1.99% 2.65% 2.41% 1.79%Critical 2.56% 3.44% 3.67% 6.62% 8.83% 10.92%Major 8.33% 8.11% 11.53% 15.23% 8.38% 10.44%Normal 76.80% 76.90% 74.95% 61.92% 60.92% 61.56%Minor 5.20% 2.95% 2.41% 1.66% 6.96% 5.39%Trivial 1.64% 0.00% 1.78% 1.66% 6.87% 3.31%Enhancement 4.41% 6.88% 3.56% 10.26% 5.61% 6.53%

If a bug is not resolved yet, we measure the time taken fromREPORTED to the latest fix attempt. In Eclipse JDT core,Type I bugs take 120.79 days and Type II bugs take 188.27days to be resolved. The others follow similar trends.

Overall, Type II bugs involve more developers in thebug report discussions and take longer time to be resolvedthan Type I bugs (p-values from T-test: 1.45e-12, 1.39e-09,2.05e-84 for the number of developers in the bug reportdiscussion, and 3.84e-04, 2.65e-07, and 8.40e-42 for bugresolution time in Eclipse JDT core, Eclipse SWT, andMozilla respectively).

�

�

⌧

�

A considerable portion of bugs requiressupplementary patches. Such bugs take longer to be

resolved and involve more developers in thediscussion of the bug reports.

B. What Are The Common Causes of Incomplete Bug Fixes?

To understand why omission errors occur in practice,we first contrast the characteristics of incomplete patches(initial patches of Type II bugs) against that of regularpatches (patches of Type I bugs). We measure the average



0%10%20%30%40%50%60%70%80%

2 3 4 5 6 7

Perc

enta

ge

Eclipse JDT core

0%

10%

20%

30%

40%

50%

60%

2 3 4 5 6 7 8 9 10 11 12

Perc

enta

ge

Eclipse SWT

0%

10%

20%

30%

40%

50%

60%

70%

2 4 6 8 10 12 14 16 18 21 25 29

Percen

tage

Mozilla


0%

10%

20%

30%

40%

50%

60%

70%

0 3 6 9 12 16 19 25 29 38 44 52 60 80 97 103

130

166

495

Perc

enta

ge

Eclipse JDT core

0%

10%

20%

30%

40%

50%

60%

70%

0 4 8 16 22 27 35 47 58 73 84 119

172

265

319

428

957

Perc

enta

ge

Eclipse SWT

0%5%10%15%20%25%30%35%

0 31 63 94 127

161

196

231

278

336

398

464

531

585

659

733

804

887

1014

Percen

tage

Mozilla







�

�

⌧

�







0%10%20%30%40%50%60%70%80%

2 3 4 5 6 7

Perc

enta

ge

Eclipse JDT core

0%

10%

20%

30%

40%

50%

60%

2 3 4 5 6 7 8 9 10 11 12

Perc

enta

ge

Eclipse SWT

0%

10%

20%

30%

40%

50%

60%

70%

2 4 6 8 10 12 14 16 18 21 25 29

Percen

tage

Mozilla


0%

10%

20%

30%

40%

50%

60%

70%

0 3 6 9 12 16 19 25 29 38 44 52 60 80 97 103

130

166

495

Perc

enta

ge

Eclipse JDT core

0%

10%

20%

30%

40%

50%

60%

70%

0 4 8 16 22 27 35 47 58 73 84 119

172

265

319

428

957

Perc

enta

ge

Eclipse SWT

0%5%10%15%20%25%30%35%

0 31 63 94 127

161

196

231

278

336

398

464

531

585

659

733

804

887

1014

Percen

tage

Mozilla







�

�

⌧

�





Motivation: repeating changes within a programcorrectly

bugs requiring supplementary patches

amount of supplements (~ missed occurrences)

days before first supplement

[“An empirical study of supplementary bug fixes” Park et al. MSR 2012]

Repeating Changes: using existing tool support

requires specifying

subjects of the transformation (LHS)

their state afterwards or change actions (RHS)

carefully ensuring

no unwarranted changes are applied no required changes are missed

where to apply a change

what change to apply

Repeating Changes: using existing tool support

Brussels’

logic meta-programming

Ekeko

Eclipse plugin

applicationsprogram and corpus analysis

program transformation

meta-programming library for Clojure

causally connected

applicative meta-programming

script queries over workspace

specify code characteristics declaratively, leave search to logic engine

manipulate workspace

tool building

Building Development Tools Interactivelyusing the EKEKO Meta-Programming Library

Coen De RooverSoftware Languages Lab

Vrije Universiteit Brussel, BelgiumEmail: [email protected]

Reinout StevensSoftware Languages Lab

Vrije Universiteit Brussel, BelgiumEmail: [email protected]

Abstract—EKEKO is a Clojure library for applicative logicmeta-programming against an Eclipse workspace. EKEKO hasbeen applied successfully to answering program queries (e.g.,“does this bug pattern occur in my code?”), to analyzing projectcorpora (e.g., “how often does this API usage pattern occurin this corpus?”), and to transforming programs (e.g., “changeoccurrences of this pattern as follows”) in a declarative manner.These applications rely on a seamless embedding of logic queriesin applicative expressions. While the former identify source codeof interest, the latter associate error markers with, computestatistics about, or rewrite the identified source code snippets.In this paper, we detail the logic and applicative aspects of theEKEKO library. We also highlight key choices in their implemen-tation. In particular, we demonstrate how a causal connectionwith the Eclipse infrastructure enables building developmenttools interactively on the Clojure read-eval-print loop.

I. INTRODUCTION

EKEKO is a Clojure library that enables querying andmanipulating an Eclipse workspace using logic queries thatare seamlessly embedded in functional expressions. Recentapplications of EKEKO include the GASR tool for detect-ing suspicious aspect-oriented code [1] and the QWALKEKOtool for reasoning about fine-grained evolutions of versionedcode [2]. In this paper, we describe the meta-programmingfacilities offered by EKEKO and highlight key choices intheir implementation1. We also draw attention to the highlyinteractive manner of tool building these facilities enable.

II. RUNNING EXAMPLE: AN ECLIPSE PLUGIN

More concretely, we will demonstrate how to build alightweight Eclipse plugin entirely on the Clojure read-eval-print loop. We will use this plugin as a running examplethroughout the rest of this paper. Our Eclipse plugin is tosupport developers in repeating similar changes throughout anentire class hierarchy. It is to associate problem markers withfields that have not yet been changed. In addition, it is topresent developers a visualization of these problems. Finally,it is to provide a “quick fix” that applies the required changescorrectly.

Figure 1 illustrates the particular changes that need to berepeated. The raw EntityIdentifier type of those fieldswithin a subclass of be.ac.chaq.model.ast.java.ASTNode

1The EKEKO library, its implementation, and all documentation is freelyavailable from https://github.com/cderoove/damp.ekeko/.

p u b l i c c l a s s B r e a k S t a t e m e n t ex tends S t a t e m e n t {/ / B e f o r e changes :@ E n t i t y P r o p e r t y ( v a l u e = SimpleName . c l a s s )p r i v a t e E n t i t y I d e n t i f i e r l a b e l ;

/ / A f t e r changes :@ E n t i t y P r o p e r t y ( v a l u e = SimpleName . c l a s s )p r i v a t e E n t i t y I d e n t i f i e r <SimpleName> l a b e l ;

/ / . . . ( a . o . , a c c e s s o r methods change a c c o r d i n g l y )}

Fig. 1: Example changes to be repeated.

that carry an @EntityProperty annotation, is to receive a typeparameter that corresponds to the annotation’s value key.

III. ARCHITECTURAL OVERVIEW

The EKEKO library operates upon a central repository ofproject models. These models contain structural and behavioralinformation that is not readily available from the projectsthemselves. The models for Java projects include abstractsyntax trees provided by the Eclipse JDT parser, but alsocontrol flow and data flow information computed by the SOOTprogram analysis framework [3].

An accompanying Eclipse plugin automatically maintainsthe EKEKO model repository. To this end, it subscribes toeach workspace change and triggers incremental updates orcomplete rebuilds of project models. As a result, the infor-mation operated upon by the EKEKO library is always up-to-date. In addition, this plugin provides an extension point thatenables registering additional kinds of project models. TheKEKO extension, for instance, builds its project model fromthe results of a partial program analysis [4] —enabling queriesover compilation units that do not build correctly.

IV. LOGIC PROGRAM QUERYING

The EKEKO library enables querying and manipulating pro-grams using logic queries and applicative expressions respec-tively. We detail the former first. Section V discusses the latter.The program querying facilities relieve tool builders fromimplementing an imperative search for source code that ex-hibits particular characteristics. Instead, developers can specifythese characteristics declaratively through a logic query. The

[WCRE-CSMR14]

Think of it as querying a database of program information!

Logic meta-programming (LMP)

Logic relations: ast/2 and has/3 “tables”

relation between a ?node of an Abstract Syntax Tree (AST)

and its ?type

(ast ?type ?node)

(has ?property ?node ?value)

LMP

relation of AST nodes and the values of their properties

(ekeko [?statement ?expression] (ast :ReturnStatement ?statement) (has :expression ?statement ?expression))

Logic querying: AST relations

SELECT ast.node as ?statement, has.value as ?expression FROM ast, has WHERE ast.type = :ReturnStatement AND has.property = :expression AND has.node = ast.node;

SQL query

relation of all AST nodes of type :ReturnStatement and the value of their :expression property

equivalent logic query

LMP

(defn expression|returned [?expression] (fresh [?statement] (ast :ReturnStatement ?statement) (has :expression ?statement ?expression)))

Logic programming: defining relations LMP

local variable

(ekeko* [?returned ?type] (expression|returned ?returned) (ast|expression-‐type ?returned ?type) (type|binary ?type)

returned expressions of a type defined in compiled rather than source code

defining relation:

using newly defined relation:

control and data flow

structural

syntactic for(Object i : collection)

Ekeko Library: relations

class Ouch {int hashCode() { return ...;}

}

scanner = new Scanner();...x.close();...scanner.next();

(ast ?kind ?ast)(has ?property ?ast ?value)(ast-encompassing|method+ ?ast ?m)(ast-encompassing|type+ ?ast ?t)

(classfile-type ?binaryfile ?type)(type-type|sub+ ?type ?subtype) (type-name|qualified ?type ?qname)(advice-shadow ?advice ?shadow)

for logic meta-programming

(method|soot-cfg ?m ?cfg)(unit|soot-usebox ?u ?ub)(local|soot-pointstoset ?l ?p)(soot|may|alias ?l1 ?l2)

Ekeko Library: functions

(remove-node node)(replace-node node newnode)(change-property node property value)(apply-and-reset-rewrites!)

for applicative meta-programming

(visualize nodes edges :layout layout :node|label labelfn :edge|label labelfn . . .)

(add-problem-marker marker node)(register-quickfix marker rewritefn) (reduce—workspace fn initval) (wait-for-builds-to-finish)

rewriting

visualizing

tooling

before

after

public class BreakStatement extends Statement { @EntityProperty(value = SimpleName.class) private EntityIdentifier label;! public EntityIdentifier getLabel() { return label; }! public void setLabel(EntityIdentifier label) { this.label = label; }}

public class BreakStatement extends Statement { @EntityProperty(value = SimpleName.class) private EntityIdentifier<SimpleName> label;! public EntityIdentifier<SimpleName> getLabel() { return label; }! public void setLabel(EntityIdentifier<SimpleName> label) { this.label = label; }}

Repeating Changes: using Ekeko

Live Demo!

Repeating Changes: using Ekeko

specifying changes requires significant expertise

and most state of the art tools

often requires multiple iterations

Imperative Program Transformation by Rewriting 53

It says that an assignment x := v (where v is a variable) can be replaced byx := c if the “last assignment” to v was v := c (where c is a constant). The sidecondition formalises the notion of “last assignment”, and will be explain later inthe paper:

n : (x := v) =⇒ x := cif

n ⊢ A△(¬def(v) U def(v) ∧ stmt(v := c))conlit(c)

The rewrite language has several important properties:

– The specification is in the form of a rewrite system with the advantages ofsuccinctness and intuitiveness mentioned above.

– The rewrite system works over a control flow graph representation of theprogram. It does this by identifying and manipulating graph blocks whichare based on the idea of basic blocks but with finer granularity.

– The rewrites are executable. An implementation exists to automatically de-termine when the rewrite applies and to perform the transformation justfrom the specification.

– The relation between the conditions on the control flow graph and the op-erational semantics of the program seems to lend itself to formal reasoningabout the transformation.

The paper is organised as follows. §2 covers earlier work in the area andprovides the motivation for this work. §3 describes our method of rewriting overcontrol graphs. §4 describes the form of side conditions for those rewrites. §5gives three examples of common transformations and their application whengiven as rewrites. §6 discusses what has been achieved and possible applicationsof this work.

2 Background

Implementing optimising transformations is hard: building a good optimisingcompiler is a major effort. If a programmer wishes to adapt a compiler to aparticular task, for example to improve the optimisation of certain library calls,intricate knowledge of the compiler internals is necessary. This contrasts with thedescription of such optimisations in textbooks [1,3,26], where they are often de-scribed in a few lines of informal English. It is not surprising, therefore, that theprogram transformation community has sought declarative ways of programmingtransformations, to enable experimentation without excessive implementation ef-fort. The idea to describe program transformations by rewriting is almost as oldas the subject itself. One early implementation can be found in the TAMPRsystem by Boyle, which has been under development since the early ’70s [8,9].TAMPR starts with a specification, which is translated to pure lambda calculus,and rewriting is performed on the pure lambda expressions. Because programs

no unwarranted changes are applied

no required changes are missed generalization

refinement

Repeating Changes : introducing Ekeko/X

more intuitively

1/ template-based transformation specifications2/ matching and rewriting directives in templates3/ template mutation operators

Towards


(ekeko/x ! ?exp.setAge(?arg) ! => ! new Integer(?arg))

(ekeko/x <LHS1>…<LHSn> => <RHS1>…<RHSn>)

template on LHS: matches identify subjects

template on RHS: generates code for every LHS match


(ekeko/x ?exp.setAge(?arg) => new Integer(?arg))

will not match setAge(5)

no destination for generated code

(ekeko/x [?exp]@[relax-‐receiver].setAge(?arg) => [new Integer(?arg)]@[(replace ?arg)])

[<component>]@[<directive>]

matching directive

rewriting directive


[]@[orlarger] class ?name extends [Component]@[orsubtype] { [[public]@[member] void acceptVisitor(?type ?param) {[ System.out.println(?string); [?x]@[(must-alias ?param)].?visitMethod(this); ]@[orflow] }]@[member]} public class OnlyLoggingLeaf extends Component {

public void acceptVisitor(ComponentVisitor v) { System.out.println("Only logging."); }} public class SillyLeaf extends OnlyLoggingLeaf { public void acceptVisitor(ComponentVisitor v) { super.acceptVisitor(v); ComponentVisitor temp = v; temp.visitSuperLogLeaf(this); }}

rich inter-procedural semantics!


[]@[orlarger] class ?name extends [Component]@[orsubtype] { [[public]@[member] void acceptVisitor(?type ?param) {[ System.out.println(?string); [?receiver]@[(must-alias ?param)].?visitMethod(this); ]@[orflow] }]@[member]} public class OnlyLoggingLeaf extends Component {

public void acceptVisitor(ComponentVisitor v) { System.out.println("Only logging."); }} public class SillyLeaf extends OnlyLoggingLeaf { public void acceptVisitor(ComponentVisitor v) { super.acceptVisitor(v); ComponentVisitor temp = v; temp.visitSuperLogLeaf(this); }}

rich inter-procedural semantics!


return age; return ?exp;introduce-variable

generalize-aliases

atomic generalization

compound generalization

public boolean hasChildren(Object element) { if (element == null) return false; return getChildren(element).length > 0; }

public boolean hasChildren(Object ?param) { if ([?ref1]@[(must-‐alias ?param)] == null) return false; return getChildren([?ref2]@[(must-‐alias ?param)]).length > 0; }

Live Demo!

Applications: beyond repeating changes

1 List<User> getRoleUser () {2 List<User> listUsers = new ArrayList<User>();3 List<User> users = this.userDao.getUsers();4 List<Role> roles = this.roleDao.getRoles();5 for (User u : users) {6 for (Roles r : roles) {7 if (u.roleId().equals(r.roleId())) {8 User userok = u;9 listUsers.add(userok);

10 }}}11 return listUsers;12 }

Figure 1: Sample code that implements join operation in applica-tion code, abridged from actual source for clarity

1 List listUsers := [ ]; int i, j = 0;2 List users := Query(SELECT ⇤ FROM users);3 List roles = Query(SELECT ⇤ FROM roles);4 while (i < users.size()) {5 while (j < roles.size()) {6 if (users[i].roleId = roles[j].roleId)7 listUsers := append(listUsers, users[i]);8 ++j;9 }

10 ++i;}Figure 2: Sample code expressed in kernel language

PostconditionlistUsers = ⇡`(./' (users, roles))where'(e

users

, eroles

) := e

users

.roleId = e

roles

.roleId` contains all the fields from the User class

Translated code1 List<User> getRoleUser () {2 List<User> listUsers = db.executeQuery(3 "SELECT u4 FROM users u, roles r5 WHERE u.roleId == r.roleId6 ORDER BY u.roleId, r.roleId");7 return listUsers; }

Figure 3: Postcondition as inferred from Fig. 1 and code after querytransformation

use ORM libraries to retrieve persistent data, our analysis is notspecific to ORM libraries and is applicable to programs withembedded SQL queries.

2. OverviewThis section gives an overview of our compilation infrastructureand the QBS algorithm to translate imperative code fragments toSQL. We use as a running example a block of code extracted froman open source project management application [2] written usingthe Hibernate framework. The original code was distributed acrossseveral methods which our system automatically collapsed into asingle continuous block of code as shown in Fig. 1. The coderetrieves the list of users from the database and produces a listcontaining a subset of users with matching roles.

The example implements the desired functionality but performspoorly. Semantically, the code performs a relational join and pro-jection. Unfortunately, due to the lack of global program informa-tion, the ORM library can only fetch all the users and roles from thedatabase and perform the join in application code, without utilizingindices or efficient join algorithms the database system has accessto. QBS fixes this problem by compiling the sample code to that

c 2 constant ::= True | False | number literal | string literale 2 expression ::= c | [ ] | var | e.f | {f

i

= e

i

} | e1 op e2 | ¬ e

| Query(...) | size(e) | gete

r

(es

)

| append(er

, es

) | unique(e)c 2 command ::= skip | var := e | if(e) then c1 else c2

| while(e) do c | c1 ; c2 | assert eop 2 binary op ::= ^ | _ | > | =

Figure 4: Abstract syntax of the kernel language

Entry�Point�Identifier

Java

InferredSQL

Java

Code��Fragment��Identifier

Java

Code�fragment

Java

Application�source�+�config files

Code�Inliner

Inlined persistentdata�methods

VC�Computation

Invariant�+�PostconditionSynthesizer

Formal��Verification

Transformedmethod�body

Java

Java

XML

Kernel�Language�Compiler

Figure 5: QBS architecture

shown at the bottom of Fig. 3. The nested loop is converted to anSQL query that implements the same functionality in the databasewhere it can be executed more efficiently. Note that the query im-poses an order on the retrieved records; this is because in general,nested loops can constraint the ordering of the output records inways that need to be captured by the query.

In order to apply the QBS algorithm to perform the desiredconversion, our system must be able to cope with the complexitiesof real-world Java code such as aliasing and method calls, whichobscure opportunities for transformations. For example, it wouldnot be possible to transform the code fragment in Fig. 1 withoutknowing that and execute specific queries on thedatabase and return non-aliased lists of results, so the first step ofthe system is to identify promising code fragments and translatethem into a simpler kernel language shown in Fig. 4.

The kernel language operates on three types of values: scalars,immutable records, and immutable lists. Lists represent the collec-tions of records and are used to model the results that are returnedfrom database retrieval operations. Lists store either scalar valuesor records constructed with scalars, and nested lists are assumed tobe appropriately flattened. The language currently does not modelthe three-valued logic of null values in SQL, and does not modelupdates to the database. The semantics of the constructs in the ker-nel language are mostly standard, with a few new ones introducedfor record retrievals. Query(...) retrieves records from the databaseand the results are returned as a list. The records of a list can berandomly accessed using get, and records can be appended to alist using append. Finally, unique takes in a list and creates a newlist with all duplicate records removed. Fig. 2 shows the exampletranslated to the kernel language.

2.1 QBS ArchitectureWe now discuss the architecture of QBS and describe the steps ininferring SQL queries from imperative code. The architecture ofQBS is shown in Fig. 5.

Identify code fragments to transform. Given a web applicationwritten in Java, QBS first finds the persistent data methods in theapplication, which are those that fetch persistent data via ORM li-

1 List<User> getRoleUser () {2 List<User> listUsers = new ArrayList<User>();3 List<User> users = this.userDao.getUsers();4 List<Role> roles = this.roleDao.getRoles();5 for (User u : users) {6 for (Roles r : roles) {7 if (u.roleId().equals(r.roleId())) {8 User userok = u;9 listUsers.add(userok);

10 }}}11 return listUsers;12 }

Figure 1: Sample code that implements join operation in applica-tion code, abridged from actual source for clarity

1 List listUsers := [ ]; int i, j = 0;2 List users := Query(SELECT ⇤ FROM users);3 List roles = Query(SELECT ⇤ FROM roles);4 while (i < users.size()) {5 while (j < roles.size()) {6 if (users[i].roleId = roles[j].roleId)7 listUsers := append(listUsers, users[i]);8 ++j;9 }

10 ++i;}Figure 2: Sample code expressed in kernel language

PostconditionlistUsers = ⇡`(./' (users, roles))where'(e

users

, eroles

) := e

users

.roleId = e

roles

.roleId` contains all the fields from the User class

Translated code1 List<User> getRoleUser () {2 List<User> listUsers = db.executeQuery(3 "SELECT u4 FROM users u, roles r5 WHERE u.roleId == r.roleId6 ORDER BY u.roleId, r.roleId");7 return listUsers; }

Figure 3: Postcondition as inferred from Fig. 1 and code after querytransformation

use ORM libraries to retrieve persistent data, our analysis is notspecific to ORM libraries and is applicable to programs withembedded SQL queries.

2. OverviewThis section gives an overview of our compilation infrastructureand the QBS algorithm to translate imperative code fragments toSQL. We use as a running example a block of code extracted froman open source project management application [2] written usingthe Hibernate framework. The original code was distributed acrossseveral methods which our system automatically collapsed into asingle continuous block of code as shown in Fig. 1. The coderetrieves the list of users from the database and produces a listcontaining a subset of users with matching roles.

The example implements the desired functionality but performspoorly. Semantically, the code performs a relational join and pro-jection. Unfortunately, due to the lack of global program informa-tion, the ORM library can only fetch all the users and roles from thedatabase and perform the join in application code, without utilizingindices or efficient join algorithms the database system has accessto. QBS fixes this problem by compiling the sample code to that

c 2 constant ::= True | False | number literal | string literale 2 expression ::= c | [ ] | var | e.f | {f

i

= e

i

} | e1 op e2 | ¬ e

| Query(...) | size(e) | gete

r

(es

)

| append(er

, es

) | unique(e)c 2 command ::= skip | var := e | if(e) then c1 else c2

| while(e) do c | c1 ; c2 | assert eop 2 binary op ::= ^ | _ | > | =

Figure 4: Abstract syntax of the kernel language

Entry�Point�Identifier

Java

InferredSQL

Java

Code��Fragment��Identifier

Java

Code�fragment

Java

Application�source�+�config files

Code�Inliner

Inlined persistentdata�methods

VC�Computation

Invariant�+�PostconditionSynthesizer

Formal��Verification

Transformedmethod�body

Java

Java

XML

Kernel�Language�Compiler

Figure 5: QBS architecture

shown at the bottom of Fig. 3. The nested loop is converted to anSQL query that implements the same functionality in the databasewhere it can be executed more efficiently. Note that the query im-poses an order on the retrieved records; this is because in general,nested loops can constraint the ordering of the output records inways that need to be captured by the query.

In order to apply the QBS algorithm to perform the desiredconversion, our system must be able to cope with the complexitiesof real-world Java code such as aliasing and method calls, whichobscure opportunities for transformations. For example, it wouldnot be possible to transform the code fragment in Fig. 1 withoutknowing that and execute specific queries on thedatabase and return non-aliased lists of results, so the first step ofthe system is to identify promising code fragments and translatethem into a simpler kernel language shown in Fig. 4.

The kernel language operates on three types of values: scalars,immutable records, and immutable lists. Lists represent the collec-tions of records and are used to model the results that are returnedfrom database retrieval operations. Lists store either scalar valuesor records constructed with scalars, and nested lists are assumed tobe appropriately flattened. The language currently does not modelthe three-valued logic of null values in SQL, and does not modelupdates to the database. The semantics of the constructs in the ker-nel language are mostly standard, with a few new ones introducedfor record retrievals. Query(...) retrieves records from the databaseand the results are returned as a list. The records of a list can berandomly accessed using get, and records can be appended to alist using append. Finally, unique takes in a list and creates a newlist with all duplicate records removed. Fig. 2 shows the exampletranslated to the kernel language.

2.1 QBS ArchitectureWe now discuss the architecture of QBS and describe the steps ininferring SQL queries from imperative code. The architecture ofQBS is shown in Fig. 5.

Identify code fragments to transform. Given a web applicationwritten in Java, QBS first finds the persistent data methods in theapplication, which are those that fetch persistent data via ORM li-

domain-specific optimizations

API migration

program renovation

domain-specific refactoringsrefactorings to eliminate or maintain clones

convert anonymous classes to Java 8 lambda expressions

migrate an application from using SWING to using SWT

[Cheung et al., PLDI13]

Ongoing work involving Ekeko/X

and possible collaborations

summarizing similar code fragments into template

action: template mutationsgoal: template that covers all fragments

start state: code fragment

state space search

efficacy of atomic vs compound mutation operators

Ongoing work involving Ekeko/X

and possible collaborations

buggy

fixed

evolvedclone B

clone B

evolvedclone B

fixed

system variant A system variant B system variant C

evolvedclone C

clone C

evolvedclone C

fixed

PATCH

PATCH'

PATCH''??

supporting clone-and-own reuse

repeat changes across variants

impact analysis of divergences on transformation

by mutating transformation

Conclusion

applicative logic meta-programming as the foundation for template-based program transformation

Science