applicative logic meta-programming as the foundation for template-based program transformation
DESCRIPTION
Lecture about Ekeko and Ekeko/X at the Graduate School of Information Science and Technology at Osaka University.TRANSCRIPT
Applicative Logic Meta-Programming as the foundation for
Template-based Program Transformation
Coen De [email protected] Engineering Laboratory Software Languages Lab
Osaka Brussels
Introducing Belgium: location and languages
Dutch6.23 million
French3.32 million
German0.07 million
Germany
LuxemburgFrance
Netherlands
Introducing Belgium: Brussels
Jeanneke pis
Zinneke pisAtomium
Grote Markt - Grand PlaceManneken pis
Introducing Belgium: painters
Magritte (1964)
Bruegel (1563)
Rubens (1615) Ensor (1889)
van Eyck (1434) van der Weyden (1445)
Permeke (1929)
Introducing Belgium: food
Waffles (from Brussels) Waffles (from Liège)
Chocolates French Fries
Introducing Belgium: drinks
For 25 little-known facts: http://cheeseweb.eu/2009/08/25-belgium/
Applicative Logic Meta-Programming as the foundation for
Template-based Program Transformation
Coen De [email protected] Engineering Laboratory Software Languages Lab
Osaka Brussels
Motivation: repeating changes within a program
annotation for properties of which evolution is to be tracked
more typesafe version
before
after
manypublic class BreakStatement extends Statement { @EntityProperty(value = SimpleName.class) private EntityIdentifier label;! public EntityIdentifier getLabel() { return label; }! public void setLabel(EntityIdentifier label) { this.label = label; }}
public class BreakStatement extends Statement { @EntityProperty(value = SimpleName.class) private EntityIdentifier<SimpleName> label;! public EntityIdentifier<SimpleName> getLabel() { return label; }! public void setLabel(EntityIdentifier<SimpleName> label) { this.label = label; }}
Table IISTUDY SUBJECTS
Eclipse JDT core Eclipse SWT Mozilla ProjectType IDE IDE Several projects associated with InternetPeriod of development 2001/06 ⇠ 2009/02 2001/05 ⇠ 2010/05 1998/03 ⇠ 2008/05Study period 2004/07 ⇠ 2006/07 2004/07 ⇠ 2006/07 2003/04 ⇠ 2005/07Total revisions 17000 revisions 21530 revisions 200000 revisions# of bugs 1812 1256 11254# of Type I bugs 1405 (77.54%) 954 (75.96%) 7562 (67.19%)# of Type II bugs 407 (22.46%) 302 (24.04%) 3692 (32.81%)
0%10%20%30%40%50%60%70%80%
2 3 4 5 6 7
Perc
enta
ge
Eclipse JDT core
0%
10%
20%
30%
40%
50%
60%
2 3 4 5 6 7 8 9 10 11 12
Perc
enta
ge
Eclipse SWT
0%
10%
20%
30%
40%
50%
60%
70%
2 4 6 8 10 12 14 16 18 21 25 29
Percen
tage
Mozilla
Figure 1. The number of times that the same bug is fixed
0%
10%
20%
30%
40%
50%
60%
70%
0 3 6 9 12 16 19 25 29 38 44 52 60 80 97 103
130
166
495
Perc
enta
ge
Eclipse JDT core
0%
10%
20%
30%
40%
50%
60%
70%
0 4 8 16 22 27 35 47 58 73 84 119
172
265
319
428
957
Perc
enta
ge
Eclipse SWT
0%5%10%15%20%25%30%35%
0 31 63 94 127
161
196
231
278
336
398
464
531
585
659
733
804
887
1014
Percen
tage
Mozilla
Figure 2. The number of days taken for the supplementary fix to appear since an initial fix
Table IIISEVERITY OF TYPE I AND TYPE II BUGS
Eclipse JDT core Eclipse SWT MozillaType I bugs Type II bugs Type I bugs Type II bugs Type I bugs Type II bugs
Blocker 1.07% 1.47% 1.99% 2.65% 2.41% 1.79%Critical 2.56% 3.44% 3.67% 6.62% 8.83% 10.92%Major 8.33% 8.11% 11.53% 15.23% 8.38% 10.44%Normal 76.80% 76.90% 74.95% 61.92% 60.92% 61.56%Minor 5.20% 2.95% 2.41% 1.66% 6.96% 5.39%Trivial 1.64% 0.00% 1.78% 1.66% 6.87% 3.31%Enhancement 4.41% 6.88% 3.56% 10.26% 5.61% 6.53%
If a bug is not resolved yet, we measure the time taken fromREPORTED to the latest fix attempt. In Eclipse JDT core,Type I bugs take 120.79 days and Type II bugs take 188.27days to be resolved. The others follow similar trends.
Overall, Type II bugs involve more developers in thebug report discussions and take longer time to be resolvedthan Type I bugs (p-values from T-test: 1.45e-12, 1.39e-09,2.05e-84 for the number of developers in the bug reportdiscussion, and 3.84e-04, 2.65e-07, and 8.40e-42 for bugresolution time in Eclipse JDT core, Eclipse SWT, andMozilla respectively).
�
�
⌧
�
A considerable portion of bugs requiressupplementary patches. Such bugs take longer to be
resolved and involve more developers in thediscussion of the bug reports.
B. What Are The Common Causes of Incomplete Bug Fixes?
To understand why omission errors occur in practice,we first contrast the characteristics of incomplete patches(initial patches of Type II bugs) against that of regularpatches (patches of Type I bugs). We measure the average
Table IISTUDY SUBJECTS
Eclipse JDT core Eclipse SWT Mozilla ProjectType IDE IDE Several projects associated with InternetPeriod of development 2001/06 ⇠ 2009/02 2001/05 ⇠ 2010/05 1998/03 ⇠ 2008/05Study period 2004/07 ⇠ 2006/07 2004/07 ⇠ 2006/07 2003/04 ⇠ 2005/07Total revisions 17000 revisions 21530 revisions 200000 revisions# of bugs 1812 1256 11254# of Type I bugs 1405 (77.54%) 954 (75.96%) 7562 (67.19%)# of Type II bugs 407 (22.46%) 302 (24.04%) 3692 (32.81%)
0%10%20%30%40%50%60%70%80%
2 3 4 5 6 7
Perc
enta
ge
Eclipse JDT core
0%
10%
20%
30%
40%
50%
60%
2 3 4 5 6 7 8 9 10 11 12
Perc
enta
ge
Eclipse SWT
0%
10%
20%
30%
40%
50%
60%
70%
2 4 6 8 10 12 14 16 18 21 25 29
Percen
tage
Mozilla
Figure 1. The number of times that the same bug is fixed
0%
10%
20%
30%
40%
50%
60%
70%
0 3 6 9 12 16 19 25 29 38 44 52 60 80 97 103
130
166
495
Perc
enta
ge
Eclipse JDT core
0%
10%
20%
30%
40%
50%
60%
70%
0 4 8 16 22 27 35 47 58 73 84 119
172
265
319
428
957
Perc
enta
ge
Eclipse SWT
0%5%10%15%20%25%30%35%
0 31 63 94 127
161
196
231
278
336
398
464
531
585
659
733
804
887
1014
Percen
tage
Mozilla
Figure 2. The number of days taken for the supplementary fix to appear since an initial fix
Table IIISEVERITY OF TYPE I AND TYPE II BUGS
Eclipse JDT core Eclipse SWT MozillaType I bugs Type II bugs Type I bugs Type II bugs Type I bugs Type II bugs
Blocker 1.07% 1.47% 1.99% 2.65% 2.41% 1.79%Critical 2.56% 3.44% 3.67% 6.62% 8.83% 10.92%Major 8.33% 8.11% 11.53% 15.23% 8.38% 10.44%Normal 76.80% 76.90% 74.95% 61.92% 60.92% 61.56%Minor 5.20% 2.95% 2.41% 1.66% 6.96% 5.39%Trivial 1.64% 0.00% 1.78% 1.66% 6.87% 3.31%Enhancement 4.41% 6.88% 3.56% 10.26% 5.61% 6.53%
If a bug is not resolved yet, we measure the time taken fromREPORTED to the latest fix attempt. In Eclipse JDT core,Type I bugs take 120.79 days and Type II bugs take 188.27days to be resolved. The others follow similar trends.
Overall, Type II bugs involve more developers in thebug report discussions and take longer time to be resolvedthan Type I bugs (p-values from T-test: 1.45e-12, 1.39e-09,2.05e-84 for the number of developers in the bug reportdiscussion, and 3.84e-04, 2.65e-07, and 8.40e-42 for bugresolution time in Eclipse JDT core, Eclipse SWT, andMozilla respectively).
�
�
⌧
�
A considerable portion of bugs requiressupplementary patches. Such bugs take longer to be
resolved and involve more developers in thediscussion of the bug reports.
B. What Are The Common Causes of Incomplete Bug Fixes?
To understand why omission errors occur in practice,we first contrast the characteristics of incomplete patches(initial patches of Type II bugs) against that of regularpatches (patches of Type I bugs). We measure the average
Table IISTUDY SUBJECTS
Eclipse JDT core Eclipse SWT Mozilla ProjectType IDE IDE Several projects associated with InternetPeriod of development 2001/06 ⇠ 2009/02 2001/05 ⇠ 2010/05 1998/03 ⇠ 2008/05Study period 2004/07 ⇠ 2006/07 2004/07 ⇠ 2006/07 2003/04 ⇠ 2005/07Total revisions 17000 revisions 21530 revisions 200000 revisions# of bugs 1812 1256 11254# of Type I bugs 1405 (77.54%) 954 (75.96%) 7562 (67.19%)# of Type II bugs 407 (22.46%) 302 (24.04%) 3692 (32.81%)
0%10%20%30%40%50%60%70%80%
2 3 4 5 6 7
Perc
enta
ge
Eclipse JDT core
0%
10%
20%
30%
40%
50%
60%
2 3 4 5 6 7 8 9 10 11 12
Perc
enta
ge
Eclipse SWT
0%
10%
20%
30%
40%
50%
60%
70%
2 4 6 8 10 12 14 16 18 21 25 29
Percen
tage
Mozilla
Figure 1. The number of times that the same bug is fixed
0%
10%
20%
30%
40%
50%
60%
70%
0 3 6 9 12 16 19 25 29 38 44 52 60 80 97 103
130
166
495
Perc
enta
ge
Eclipse JDT core
0%
10%
20%
30%
40%
50%
60%
70%
0 4 8 16 22 27 35 47 58 73 84 119
172
265
319
428
957
Perc
enta
ge
Eclipse SWT
0%5%10%15%20%25%30%35%
0 31 63 94 127
161
196
231
278
336
398
464
531
585
659
733
804
887
1014
Percen
tage
Mozilla
Figure 2. The number of days taken for the supplementary fix to appear since an initial fix
Table IIISEVERITY OF TYPE I AND TYPE II BUGS
Eclipse JDT core Eclipse SWT MozillaType I bugs Type II bugs Type I bugs Type II bugs Type I bugs Type II bugs
Blocker 1.07% 1.47% 1.99% 2.65% 2.41% 1.79%Critical 2.56% 3.44% 3.67% 6.62% 8.83% 10.92%Major 8.33% 8.11% 11.53% 15.23% 8.38% 10.44%Normal 76.80% 76.90% 74.95% 61.92% 60.92% 61.56%Minor 5.20% 2.95% 2.41% 1.66% 6.96% 5.39%Trivial 1.64% 0.00% 1.78% 1.66% 6.87% 3.31%Enhancement 4.41% 6.88% 3.56% 10.26% 5.61% 6.53%
If a bug is not resolved yet, we measure the time taken fromREPORTED to the latest fix attempt. In Eclipse JDT core,Type I bugs take 120.79 days and Type II bugs take 188.27days to be resolved. The others follow similar trends.
Overall, Type II bugs involve more developers in thebug report discussions and take longer time to be resolvedthan Type I bugs (p-values from T-test: 1.45e-12, 1.39e-09,2.05e-84 for the number of developers in the bug reportdiscussion, and 3.84e-04, 2.65e-07, and 8.40e-42 for bugresolution time in Eclipse JDT core, Eclipse SWT, andMozilla respectively).
�
�
⌧
�
A considerable portion of bugs requiressupplementary patches. Such bugs take longer to be
resolved and involve more developers in thediscussion of the bug reports.
B. What Are The Common Causes of Incomplete Bug Fixes?
To understand why omission errors occur in practice,we first contrast the characteristics of incomplete patches(initial patches of Type II bugs) against that of regularpatches (patches of Type I bugs). We measure the average
Motivation: repeating changes within a programcorrectly
bugs requiring supplementary patches
amount of supplements (~ missed occurrences)
days before first supplement
[“An empirical study of supplementary bug fixes” Park et al. MSR 2012]
Repeating Changes: using existing tool support
requires specifying
subjects of the transformation (LHS)
their state afterwards or change actions (RHS)
carefully ensuring
no unwarranted changes are applied no required changes are missed
where to apply a change
what change to apply
Repeating Changes: using existing tool support
Brussels’
logic meta-programming
Ekeko
Eclipse plugin
applicationsprogram and corpus analysis
program transformation
meta-programming library for Clojure
causally connected
applicative meta-programming
script queries over workspace
specify code characteristics declaratively, leave search to logic engine
manipulate workspace
tool building
Building Development Tools Interactivelyusing the EKEKO Meta-Programming Library
Coen De RooverSoftware Languages Lab
Vrije Universiteit Brussel, BelgiumEmail: [email protected]
Reinout StevensSoftware Languages Lab
Vrije Universiteit Brussel, BelgiumEmail: [email protected]
Abstract—EKEKO is a Clojure library for applicative logicmeta-programming against an Eclipse workspace. EKEKO hasbeen applied successfully to answering program queries (e.g.,“does this bug pattern occur in my code?”), to analyzing projectcorpora (e.g., “how often does this API usage pattern occurin this corpus?”), and to transforming programs (e.g., “changeoccurrences of this pattern as follows”) in a declarative manner.These applications rely on a seamless embedding of logic queriesin applicative expressions. While the former identify source codeof interest, the latter associate error markers with, computestatistics about, or rewrite the identified source code snippets.In this paper, we detail the logic and applicative aspects of theEKEKO library. We also highlight key choices in their implemen-tation. In particular, we demonstrate how a causal connectionwith the Eclipse infrastructure enables building developmenttools interactively on the Clojure read-eval-print loop.
I. INTRODUCTION
EKEKO is a Clojure library that enables querying andmanipulating an Eclipse workspace using logic queries thatare seamlessly embedded in functional expressions. Recentapplications of EKEKO include the GASR tool for detect-ing suspicious aspect-oriented code [1] and the QWALKEKOtool for reasoning about fine-grained evolutions of versionedcode [2]. In this paper, we describe the meta-programmingfacilities offered by EKEKO and highlight key choices intheir implementation1. We also draw attention to the highlyinteractive manner of tool building these facilities enable.
II. RUNNING EXAMPLE: AN ECLIPSE PLUGIN
More concretely, we will demonstrate how to build alightweight Eclipse plugin entirely on the Clojure read-eval-print loop. We will use this plugin as a running examplethroughout the rest of this paper. Our Eclipse plugin is tosupport developers in repeating similar changes throughout anentire class hierarchy. It is to associate problem markers withfields that have not yet been changed. In addition, it is topresent developers a visualization of these problems. Finally,it is to provide a “quick fix” that applies the required changescorrectly.
Figure 1 illustrates the particular changes that need to berepeated. The raw EntityIdentifier type of those fieldswithin a subclass of be.ac.chaq.model.ast.java.ASTNode
1The EKEKO library, its implementation, and all documentation is freelyavailable from https://github.com/cderoove/damp.ekeko/.
p u b l i c c l a s s B r e a k S t a t e m e n t ex tends S t a t e m e n t {/ / B e f o r e changes :@ E n t i t y P r o p e r t y ( v a l u e = SimpleName . c l a s s )p r i v a t e E n t i t y I d e n t i f i e r l a b e l ;
/ / A f t e r changes :@ E n t i t y P r o p e r t y ( v a l u e = SimpleName . c l a s s )p r i v a t e E n t i t y I d e n t i f i e r <SimpleName> l a b e l ;
/ / . . . ( a . o . , a c c e s s o r methods change a c c o r d i n g l y )}
Fig. 1: Example changes to be repeated.
that carry an @EntityProperty annotation, is to receive a typeparameter that corresponds to the annotation’s value key.
III. ARCHITECTURAL OVERVIEW
The EKEKO library operates upon a central repository ofproject models. These models contain structural and behavioralinformation that is not readily available from the projectsthemselves. The models for Java projects include abstractsyntax trees provided by the Eclipse JDT parser, but alsocontrol flow and data flow information computed by the SOOTprogram analysis framework [3].
An accompanying Eclipse plugin automatically maintainsthe EKEKO model repository. To this end, it subscribes toeach workspace change and triggers incremental updates orcomplete rebuilds of project models. As a result, the infor-mation operated upon by the EKEKO library is always up-to-date. In addition, this plugin provides an extension point thatenables registering additional kinds of project models. TheKEKO extension, for instance, builds its project model fromthe results of a partial program analysis [4] —enabling queriesover compilation units that do not build correctly.
IV. LOGIC PROGRAM QUERYING
The EKEKO library enables querying and manipulating pro-grams using logic queries and applicative expressions respec-tively. We detail the former first. Section V discusses the latter.The program querying facilities relieve tool builders fromimplementing an imperative search for source code that ex-hibits particular characteristics. Instead, developers can specifythese characteristics declaratively through a logic query. The
[WCRE-CSMR14]
Think of it as querying a database of program information!
Logic meta-programming (LMP)
Logic relations: ast/2 and has/3 “tables”
relation between a ?node of an Abstract Syntax Tree (AST)
and its ?type
(ast ?type ?node)
(has ?property ?node ?value)
LMP
relation of AST nodes and the values of their properties
(ekeko [?statement ?expression] (ast :ReturnStatement ?statement) (has :expression ?statement ?expression))
Logic querying: AST relations
SELECT ast.node as ?statement, has.value as ?expression FROM ast, has WHERE ast.type = :ReturnStatement AND has.property = :expression AND has.node = ast.node;
SQL query
relation of all AST nodes of type :ReturnStatement and the value of their :expression property
equivalent logic query
LMP
(defn expression|returned [?expression] (fresh [?statement] (ast :ReturnStatement ?statement) (has :expression ?statement ?expression)))
Logic programming: defining relations LMP
local variable
(ekeko* [?returned ?type] (expression|returned ?returned) (ast|expression-‐type ?returned ?type) (type|binary ?type)
returned expressions of a type defined in compiled rather than source code
defining relation:
using newly defined relation:
control and data flow
structural
syntactic for(Object i : collection)
Ekeko Library: relations
class Ouch {int hashCode() { return ...;}
}
scanner = new Scanner();...x.close();...scanner.next();
(ast ?kind ?ast)(has ?property ?ast ?value)(ast-encompassing|method+ ?ast ?m)(ast-encompassing|type+ ?ast ?t)
(classfile-type ?binaryfile ?type)(type-type|sub+ ?type ?subtype) (type-name|qualified ?type ?qname)(advice-shadow ?advice ?shadow)
for logic meta-programming
(method|soot-cfg ?m ?cfg)(unit|soot-usebox ?u ?ub)(local|soot-pointstoset ?l ?p)(soot|may|alias ?l1 ?l2)
Ekeko Library: functions
(remove-node node)(replace-node node newnode)(change-property node property value)(apply-and-reset-rewrites!)
for applicative meta-programming
(visualize nodes edges :layout layout :node|label labelfn :edge|label labelfn . . .)
(add-problem-marker marker node)(register-quickfix marker rewritefn) (reduce—workspace fn initval) (wait-for-builds-to-finish)
rewriting
visualizing
tooling
before
after
public class BreakStatement extends Statement { @EntityProperty(value = SimpleName.class) private EntityIdentifier label;! public EntityIdentifier getLabel() { return label; }! public void setLabel(EntityIdentifier label) { this.label = label; }}
public class BreakStatement extends Statement { @EntityProperty(value = SimpleName.class) private EntityIdentifier<SimpleName> label;! public EntityIdentifier<SimpleName> getLabel() { return label; }! public void setLabel(EntityIdentifier<SimpleName> label) { this.label = label; }}
Repeating Changes: using Ekeko
Live Demo!
Repeating Changes: using Ekeko
specifying changes requires significant expertise
and most state of the art tools
often requires multiple iterations
Imperative Program Transformation by Rewriting 53
It says that an assignment x := v (where v is a variable) can be replaced byx := c if the “last assignment” to v was v := c (where c is a constant). The sidecondition formalises the notion of “last assignment”, and will be explain later inthe paper:
n : (x := v) =⇒ x := cif
n ⊢ A△(¬def(v) U def(v) ∧ stmt(v := c))conlit(c)
The rewrite language has several important properties:
– The specification is in the form of a rewrite system with the advantages ofsuccinctness and intuitiveness mentioned above.
– The rewrite system works over a control flow graph representation of theprogram. It does this by identifying and manipulating graph blocks whichare based on the idea of basic blocks but with finer granularity.
– The rewrites are executable. An implementation exists to automatically de-termine when the rewrite applies and to perform the transformation justfrom the specification.
– The relation between the conditions on the control flow graph and the op-erational semantics of the program seems to lend itself to formal reasoningabout the transformation.
The paper is organised as follows. §2 covers earlier work in the area andprovides the motivation for this work. §3 describes our method of rewriting overcontrol graphs. §4 describes the form of side conditions for those rewrites. §5gives three examples of common transformations and their application whengiven as rewrites. §6 discusses what has been achieved and possible applicationsof this work.
2 Background
Implementing optimising transformations is hard: building a good optimisingcompiler is a major effort. If a programmer wishes to adapt a compiler to aparticular task, for example to improve the optimisation of certain library calls,intricate knowledge of the compiler internals is necessary. This contrasts with thedescription of such optimisations in textbooks [1,3,26], where they are often de-scribed in a few lines of informal English. It is not surprising, therefore, that theprogram transformation community has sought declarative ways of programmingtransformations, to enable experimentation without excessive implementation ef-fort. The idea to describe program transformations by rewriting is almost as oldas the subject itself. One early implementation can be found in the TAMPRsystem by Boyle, which has been under development since the early ’70s [8,9].TAMPR starts with a specification, which is translated to pure lambda calculus,and rewriting is performed on the pure lambda expressions. Because programs
no unwarranted changes are applied
no required changes are missed generalization
refinement
Repeating Changes : introducing Ekeko/X
more intuitively
1/ template-based transformation specifications2/ matching and rewriting directives in templates3/ template mutation operators
Towards
1/ template-based transformation specifications2/ matching and rewriting directives in templates3/ template mutation operators
(ekeko/x ! ?exp.setAge(?arg) ! => ! new Integer(?arg))
(ekeko/x <LHS1>…<LHSn> => <RHS1>…<RHSn>)
template on LHS: matches identify subjects
template on RHS: generates code for every LHS match
1/ template-based transformation specifications2/ matching and rewriting directives in templates3/ template mutation operators
(ekeko/x ?exp.setAge(?arg) => new Integer(?arg))
will not match setAge(5)
no destination for generated code
(ekeko/x [?exp]@[relax-‐receiver].setAge(?arg) => [new Integer(?arg)]@[(replace ?arg)])
[<component>]@[<directive>]
matching directive
rewriting directive
1/ template-based transformation specifications2/ matching and rewriting directives in templates3/ template mutation operators
[]@[orlarger] class ?name extends [Component]@[orsubtype] { [[public]@[member] void acceptVisitor(?type ?param) {[ System.out.println(?string); [?x]@[(must-alias ?param)].?visitMethod(this); ]@[orflow] }]@[member]} public class OnlyLoggingLeaf extends Component {
public void acceptVisitor(ComponentVisitor v) { System.out.println("Only logging."); }} public class SillyLeaf extends OnlyLoggingLeaf { public void acceptVisitor(ComponentVisitor v) { super.acceptVisitor(v); ComponentVisitor temp = v; temp.visitSuperLogLeaf(this); }}
rich inter-procedural semantics!
1/ template-based transformation specifications2/ matching and rewriting directives in templates3/ template mutation operators
[]@[orlarger] class ?name extends [Component]@[orsubtype] { [[public]@[member] void acceptVisitor(?type ?param) {[ System.out.println(?string); [?receiver]@[(must-alias ?param)].?visitMethod(this); ]@[orflow] }]@[member]} public class OnlyLoggingLeaf extends Component {
public void acceptVisitor(ComponentVisitor v) { System.out.println("Only logging."); }} public class SillyLeaf extends OnlyLoggingLeaf { public void acceptVisitor(ComponentVisitor v) { super.acceptVisitor(v); ComponentVisitor temp = v; temp.visitSuperLogLeaf(this); }}
rich inter-procedural semantics!
1/ template-based transformation specifications2/ matching and rewriting directives in templates3/ template mutation operators
return age; return ?exp;introduce-variable
generalize-aliases
atomic generalization
compound generalization
public boolean hasChildren(Object element) { if (element == null) return false; return getChildren(element).length > 0; }
public boolean hasChildren(Object ?param) { if ([?ref1]@[(must-‐alias ?param)] == null) return false; return getChildren([?ref2]@[(must-‐alias ?param)]).length > 0; }
1/ template-based transformation specifications2/ matching and rewriting directives in templates3/ template mutation operators
return age; return ?exp;introduce-variable
generalize-aliases
atomic generalization
compound generalization
public boolean hasChildren(Object element) { if (element == null) return false; return getChildren(element).length > 0; }
public boolean hasChildren(Object ?param) { if ([?ref1]@[(must-‐alias ?param)] == null) return false; return getChildren([?ref2]@[(must-‐alias ?param)]).length > 0; }
Live Demo!
Applications: beyond repeating changes
1 List<User> getRoleUser () {2 List<User> listUsers = new ArrayList<User>();3 List<User> users = this.userDao.getUsers();4 List<Role> roles = this.roleDao.getRoles();5 for (User u : users) {6 for (Roles r : roles) {7 if (u.roleId().equals(r.roleId())) {8 User userok = u;9 listUsers.add(userok);
10 }}}11 return listUsers;12 }
Figure 1: Sample code that implements join operation in applica-tion code, abridged from actual source for clarity
1 List listUsers := [ ]; int i, j = 0;2 List users := Query(SELECT ⇤ FROM users);3 List roles = Query(SELECT ⇤ FROM roles);4 while (i < users.size()) {5 while (j < roles.size()) {6 if (users[i].roleId = roles[j].roleId)7 listUsers := append(listUsers, users[i]);8 ++j;9 }
10 ++i;}Figure 2: Sample code expressed in kernel language
PostconditionlistUsers = ⇡`(./' (users, roles))where'(e
users
, eroles
) := e
users
.roleId = e
roles
.roleId` contains all the fields from the User class
Translated code1 List<User> getRoleUser () {2 List<User> listUsers = db.executeQuery(3 "SELECT u4 FROM users u, roles r5 WHERE u.roleId == r.roleId6 ORDER BY u.roleId, r.roleId");7 return listUsers; }
Figure 3: Postcondition as inferred from Fig. 1 and code after querytransformation
use ORM libraries to retrieve persistent data, our analysis is notspecific to ORM libraries and is applicable to programs withembedded SQL queries.
2. OverviewThis section gives an overview of our compilation infrastructureand the QBS algorithm to translate imperative code fragments toSQL. We use as a running example a block of code extracted froman open source project management application [2] written usingthe Hibernate framework. The original code was distributed acrossseveral methods which our system automatically collapsed into asingle continuous block of code as shown in Fig. 1. The coderetrieves the list of users from the database and produces a listcontaining a subset of users with matching roles.
The example implements the desired functionality but performspoorly. Semantically, the code performs a relational join and pro-jection. Unfortunately, due to the lack of global program informa-tion, the ORM library can only fetch all the users and roles from thedatabase and perform the join in application code, without utilizingindices or efficient join algorithms the database system has accessto. QBS fixes this problem by compiling the sample code to that
c 2 constant ::= True | False | number literal | string literale 2 expression ::= c | [ ] | var | e.f | {f
i
= e
i
} | e1 op e2 | ¬ e
| Query(...) | size(e) | gete
r
(es
)
| append(er
, es
) | unique(e)c 2 command ::= skip | var := e | if(e) then c1 else c2
| while(e) do c | c1 ; c2 | assert eop 2 binary op ::= ^ | _ | > | =
Figure 4: Abstract syntax of the kernel language
Entry�Point�Identifier
Java
InferredSQL
Java
Code��Fragment��Identifier
Java
Code�fragment
Java
Application�source�+�config files
Code�Inliner
Inlined persistentdata�methods
VC�Computation
Invariant�+�PostconditionSynthesizer
Formal��Verification
Transformedmethod�body
Java
Java
XML
Kernel�Language�Compiler
Figure 5: QBS architecture
shown at the bottom of Fig. 3. The nested loop is converted to anSQL query that implements the same functionality in the databasewhere it can be executed more efficiently. Note that the query im-poses an order on the retrieved records; this is because in general,nested loops can constraint the ordering of the output records inways that need to be captured by the query.
In order to apply the QBS algorithm to perform the desiredconversion, our system must be able to cope with the complexitiesof real-world Java code such as aliasing and method calls, whichobscure opportunities for transformations. For example, it wouldnot be possible to transform the code fragment in Fig. 1 withoutknowing that and execute specific queries on thedatabase and return non-aliased lists of results, so the first step ofthe system is to identify promising code fragments and translatethem into a simpler kernel language shown in Fig. 4.
The kernel language operates on three types of values: scalars,immutable records, and immutable lists. Lists represent the collec-tions of records and are used to model the results that are returnedfrom database retrieval operations. Lists store either scalar valuesor records constructed with scalars, and nested lists are assumed tobe appropriately flattened. The language currently does not modelthe three-valued logic of null values in SQL, and does not modelupdates to the database. The semantics of the constructs in the ker-nel language are mostly standard, with a few new ones introducedfor record retrievals. Query(...) retrieves records from the databaseand the results are returned as a list. The records of a list can berandomly accessed using get, and records can be appended to alist using append. Finally, unique takes in a list and creates a newlist with all duplicate records removed. Fig. 2 shows the exampletranslated to the kernel language.
2.1 QBS ArchitectureWe now discuss the architecture of QBS and describe the steps ininferring SQL queries from imperative code. The architecture ofQBS is shown in Fig. 5.
Identify code fragments to transform. Given a web applicationwritten in Java, QBS first finds the persistent data methods in theapplication, which are those that fetch persistent data via ORM li-
1 List<User> getRoleUser () {2 List<User> listUsers = new ArrayList<User>();3 List<User> users = this.userDao.getUsers();4 List<Role> roles = this.roleDao.getRoles();5 for (User u : users) {6 for (Roles r : roles) {7 if (u.roleId().equals(r.roleId())) {8 User userok = u;9 listUsers.add(userok);
10 }}}11 return listUsers;12 }
Figure 1: Sample code that implements join operation in applica-tion code, abridged from actual source for clarity
1 List listUsers := [ ]; int i, j = 0;2 List users := Query(SELECT ⇤ FROM users);3 List roles = Query(SELECT ⇤ FROM roles);4 while (i < users.size()) {5 while (j < roles.size()) {6 if (users[i].roleId = roles[j].roleId)7 listUsers := append(listUsers, users[i]);8 ++j;9 }
10 ++i;}Figure 2: Sample code expressed in kernel language
PostconditionlistUsers = ⇡`(./' (users, roles))where'(e
users
, eroles
) := e
users
.roleId = e
roles
.roleId` contains all the fields from the User class
Translated code1 List<User> getRoleUser () {2 List<User> listUsers = db.executeQuery(3 "SELECT u4 FROM users u, roles r5 WHERE u.roleId == r.roleId6 ORDER BY u.roleId, r.roleId");7 return listUsers; }
Figure 3: Postcondition as inferred from Fig. 1 and code after querytransformation
use ORM libraries to retrieve persistent data, our analysis is notspecific to ORM libraries and is applicable to programs withembedded SQL queries.
2. OverviewThis section gives an overview of our compilation infrastructureand the QBS algorithm to translate imperative code fragments toSQL. We use as a running example a block of code extracted froman open source project management application [2] written usingthe Hibernate framework. The original code was distributed acrossseveral methods which our system automatically collapsed into asingle continuous block of code as shown in Fig. 1. The coderetrieves the list of users from the database and produces a listcontaining a subset of users with matching roles.
The example implements the desired functionality but performspoorly. Semantically, the code performs a relational join and pro-jection. Unfortunately, due to the lack of global program informa-tion, the ORM library can only fetch all the users and roles from thedatabase and perform the join in application code, without utilizingindices or efficient join algorithms the database system has accessto. QBS fixes this problem by compiling the sample code to that
c 2 constant ::= True | False | number literal | string literale 2 expression ::= c | [ ] | var | e.f | {f
i
= e
i
} | e1 op e2 | ¬ e
| Query(...) | size(e) | gete
r
(es
)
| append(er
, es
) | unique(e)c 2 command ::= skip | var := e | if(e) then c1 else c2
| while(e) do c | c1 ; c2 | assert eop 2 binary op ::= ^ | _ | > | =
Figure 4: Abstract syntax of the kernel language
Entry�Point�Identifier
Java
InferredSQL
Java
Code��Fragment��Identifier
Java
Code�fragment
Java
Application�source�+�config files
Code�Inliner
Inlined persistentdata�methods
VC�Computation
Invariant�+�PostconditionSynthesizer
Formal��Verification
Transformedmethod�body
Java
Java
XML
Kernel�Language�Compiler
Figure 5: QBS architecture
shown at the bottom of Fig. 3. The nested loop is converted to anSQL query that implements the same functionality in the databasewhere it can be executed more efficiently. Note that the query im-poses an order on the retrieved records; this is because in general,nested loops can constraint the ordering of the output records inways that need to be captured by the query.
In order to apply the QBS algorithm to perform the desiredconversion, our system must be able to cope with the complexitiesof real-world Java code such as aliasing and method calls, whichobscure opportunities for transformations. For example, it wouldnot be possible to transform the code fragment in Fig. 1 withoutknowing that and execute specific queries on thedatabase and return non-aliased lists of results, so the first step ofthe system is to identify promising code fragments and translatethem into a simpler kernel language shown in Fig. 4.
The kernel language operates on three types of values: scalars,immutable records, and immutable lists. Lists represent the collec-tions of records and are used to model the results that are returnedfrom database retrieval operations. Lists store either scalar valuesor records constructed with scalars, and nested lists are assumed tobe appropriately flattened. The language currently does not modelthe three-valued logic of null values in SQL, and does not modelupdates to the database. The semantics of the constructs in the ker-nel language are mostly standard, with a few new ones introducedfor record retrievals. Query(...) retrieves records from the databaseand the results are returned as a list. The records of a list can berandomly accessed using get, and records can be appended to alist using append. Finally, unique takes in a list and creates a newlist with all duplicate records removed. Fig. 2 shows the exampletranslated to the kernel language.
2.1 QBS ArchitectureWe now discuss the architecture of QBS and describe the steps ininferring SQL queries from imperative code. The architecture ofQBS is shown in Fig. 5.
Identify code fragments to transform. Given a web applicationwritten in Java, QBS first finds the persistent data methods in theapplication, which are those that fetch persistent data via ORM li-
domain-specific optimizations
API migration
program renovation
domain-specific refactoringsrefactorings to eliminate or maintain clones
convert anonymous classes to Java 8 lambda expressions
migrate an application from using SWING to using SWT
[Cheung et al., PLDI13]
Ongoing work involving Ekeko/X
and possible collaborations
summarizing similar code fragments into template
action: template mutationsgoal: template that covers all fragments
start state: code fragment
state space search
efficacy of atomic vs compound mutation operators
Ongoing work involving Ekeko/X
and possible collaborations
buggy
fixed
evolvedclone B
clone B
evolvedclone B
fixed
system variant A system variant B system variant C
evolvedclone C
clone C
evolvedclone C
fixed
PATCH
PATCH'
PATCH''??
supporting clone-and-own reuse
repeat changes across variants
impact analysis of divergences on transformation
by mutating transformation
Conclusion