api migration is a hard problem! - uni koblenz-landaulaemmel/1203-ames.pdf · 2]; the wrapper must...

101
API migration is a hard problem! Ralf Lämmel Software Languages Team, University of Koblenz-Landau Acknowledgement: This is joint work with (in alphabetic order) Thiago Tonelli Bartolomei (University of Waterloo, Canada), Krzysztof Czarnecki (University of Waterloo, Canada), Tijs van der Storm (CWI, Amsterdam, The Netherlands). I also acknowledge joint work within the Software Languages Team on the related subject of API (usage) analysis; special thanks are due to Ekaterina Pek. http://professor-fish.blogspot.com/2012/03/should-i-declare-defeat-on-research.html http://www.ece.iastate.edu/seminars-and-events/api-migration-is-a-hard-problem/ 19 March 2012 http://softlang.wikidot.com/api

Upload: others

Post on 17-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

API migration is a hard problem!

Ralf LämmelSoftware Languages Team, University of Koblenz-Landau

Acknowledgement: This is joint work with (in alphabetic order) Thiago Tonelli Bartolomei (University of Waterloo, Canada), Krzysztof Czarnecki (University of Waterloo, Canada), Tijs van der Storm (CWI, Amsterdam, The Netherlands). I also acknowledge joint work within the Software Languages Team on the related subject of API (usage) analysis;

special thanks are due to Ekaterina Pek.

http://professor-fish.blogspot.com/2012/03/should-i-declare-defeat-on-research.html

http://www.ece.iastate.edu/seminars-and-events/api-migration-is-a-hard-problem/

19 March 2012

http://softlang.wikidot.com/api

Page 2: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Figure 1: SwingSet2’s Tooltip demo in Swing, SwingWT and evolved SwingWT versions.

The aforementioned method has been implemented in atoolkit called Koloo. The paper’s website1 provides accessto accompanying material such as the wrappers and appli-cations of the study as well as additional data.

Road-map of the paper. §2 presents a motivating exam-ple for wrapper development based on an open-source wrap-per for Java’s Swing API for GUI programming. §3 re-ports on interviews conducted with developers experiencedin API migration, providing requirements for an API migra-tion method. §4 develops the overall notion of compliancetesting for wrapper-based API migration. §5 describes amethod for API migration based on compliance testing. §6describes validation. §7 analyzes related work. §8 concludesthe paper.

2. MOTIVATING EXAMPLEWe will illustrate here compliance issues between original

API and its wrapper-based re-implementation. The aim ofwrapper development is to reduce such issues up to the pointthat the wrapper is ‘good enough’ for use in applications.Original API and replacement API (i.e., the API used inter-nally by the wrapper) may enjoy ‘arbitrary’ di!erences [3,2]; the wrapper must neutralize these di!erences.

Our example concerns two GUI APIs: i) the Swing APIwith the help of the AWT API, which are both part of JDK

(‘the Java platform’), and ii) the SWT API2, which is partof the Eclipse project. There exists the open-source projectSwingWT

3, which is a wrapper-based re-implementation ofSwing in terms of SWT.Using the method of this paper (see §5–§6) and designated

tool support (i.e., the Koloo toolkit), we have developed arevision of SwingWT

4, to which we also refer as ‘evolvedSwingWT’ in the sequel.Fig. 1 illustrates compliance issues of SwingWT versus

Swing; it also illustrates how our systematically evolved wrap-per reduces the compliance issues. In the figure, we exerciseone scenario of the Tooltip demo of SwingSet25, which is aset of Swing demos originally distributed by Sun. The sce-nario is concerned with displaying a specific tooltip, as it istriggered when the user positions the mouse over the cow’smouth. The scenario was chosen because the Tooltip demois clearly concerned with triggering tooltips.

1http://gsd.uwaterloo.ca/issta20122http://eclipse.org/swt3http://swingwt.sf.net4http://swingwt.svn.sf.net/viewvc/swingwt/swingwt/trunk/CHANGELOG?revision=865Source code see Sun JDK folder demo/jfc/SwingSet2/

On the left, the reference behavior is that the cow moos.In the middle, the SwingWT wrapper, prior to our e!orts,is at work. Two problems are noticeable in the image: thebackground color is gray instead of white, and the tooltipshows a di!erent message: ‘cow’ as opposed to ‘Mooooooo’.On the right, the evolved SwingWT wrapper is at work.The background color complies with the original API; thetooltip’s messages complies with the original API as well.We have chosen the example to be visually striking, butwe must emphasize that the method proposed in this pa-per specifically reveals issues that are not easily spotted bylooking at particular screenshots; it also applies to domainsother than GUI programming.

Our method reveals behavioral di!erences in an automatedmanner on the grounds of checked assertions for API con-tracts. Revealed di!erences are also referred to as violations(of compliance). Our method can be exercised with varyinglevels of scrutiny in terms of API contracts and correspond-ing assertions to be imposed on scenario execution. Koloosupports recording and replaying scenarios as well as com-paring results of execution across original API and wrapper.

If we enable API contracts for return values of calls to APImethods exercised by the application, then the violation isdetected that SwingWT does not reproduce the return valuefor the method contains(Point) of class java.awt.Polygon.An investigation reveals that the Tooltip demo uses themethod to map areas of the image to the corresponding mes-sages. The violation is directly linked to the observable factthat a wrong message is displayed in the middle of Fig. 1.

If we also enable API contracts for asynchronous callsfrom the API to the application, then the violationis detected that SwingWT does not execute an asyn-chronous callback to the method contains(int, int) ofclass java.awt.Component. An investigation reveals thatthe Tooltip demo overrides the method to select the appro-priate tooltip to display—specific to the component. Theviolation is also directly linked to the observable fact that awrong message is displayed in the middle of Fig. 1.

If we also enable contracts for state integrity for method-call receivers of API types, then a violation is also flaggedfor the diverging background color, which cannot be revealedthough by looking at return values of method calls becausethe scenario’s design is such that the application does notexercise the (getter of the) background color in any way.It must be noted that additional scrutiny in terms of APIcontracts may also imply noise in reporting violations (see§6) in so far that elimination of these violations does notneed to be reasonably expected by a ‘good enough’ wrapper.

API migration is a hard problem!

2 T. T. Bartolomei and K. Czarnecki and R. Lämmel: Compliance testing for wrapper-based API migration. Submitted.

A Swing demoThe same demo

with the SwingWT1 wrapper

Use of an improved wrapper2

1 http://swingwt.sourceforge.net/

Page 3: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Plan for this talk

1. API = Application Programming Interface

2. The very basics of API migration

3. The challenge of API differences

4. The utility of API (usage) analysis

5. Studies on API migration

6. Compliance testing for API migration

7. Related work on API migration

8. Call to arms on API migration

Page 4: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

API = Application Programming Interface

Page 5: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

These are core and 3rd party APIs for the Java platform.

than 100 commits to the repository. This selection resulted in 60reference projects out of all the 1,476 built projects.

3.5 Size metrics for the corpus

Numbers of projects and their NCLOC sizes, and other metricsare summarized in Table 1 and Table 2. We use the metric MC forthe number of method calls.

Metric Value

Projects 6,286Source files 2,121,688LOC 377,640,164NCLOC 264,536,500Import statements 14,335,066

Table 1. Summary of token-based analysis (with all auto-matically identified Java/SVN projects on SourceForge).

Metric Value

Projects with attempted builds 1,639Built projects 1,476Packages 46,145Classes 198,948Methods 1,397,099Method calls 8,163,083

Table 2. Summary of AST-based analysis.

Figure 1. Size metrics (NCLOC, MC) for built and un-built projects (thinned out). The projects are ordered by thevalues for the NCLOC metric.

Figure 1 presents the distribution of size metrics (NCLOC,MC) for the corpus (y-axis is normalized w.r.t. the maximum ofthe metric in each group). As one can see, both metrics correlatereasonably. The maximum of NCLOC in the whole corpus (incl.unbuilt projects) is 25,515,312, the maximum of NCLOC amongbuilt projects is 1,985,977, which implies a factor 12.85 differ-ence. Hence, we are missing the biggest projects currently. Themaximum of MC among built projects is 228,242.

AP

I

Do

ma

in

Co

re

#P

ro

jects

#C

all

s

#D

isti

nct

meth

od

sca

lled

Java Collections Collections yes 1374 392639 406AWT GUI yes 754 360903 1607Swing GUI yes 716 581363 3369Reflection Other yes 560 15611 154Core XML XML yes 413 90415 537DOM XML yes 324 52593 180SAX XML no 310 13725 156log4j Logging no 254 43533 187JUnit Testing no 233 71481 1011Comm.Logging Logging no 151 21996 88

Table 3. Top 10 of the known APIs (sorted by the numberof projects using an API).

3.6 Provision of API metadata

Both Core Java APIs and manually downloaded JARs wereprocessed by us to assign metadata: name of the API, the nameof a programming domain, one or more package prefixes, and po-tentially a white-list of API types. Table 3 lists the top 10 of allthe 77 manually tagged APIs together with some metadata andmetrics. We used a special reflection-based fact extractor for thevisible types and members of the API JARs. (Alternatively, onecould also attempt to leverage API documentation, but such docu-mentation may not be available for some of the APIs, and it wouldtake extra effort to establish consistency between JARs and docu-mentation.) These facts are also stored in the database, and theyare leveraged by some forms of API-usage analysis.

4. Examples of API-usage analysis

We will introduce a few examples of API-usage analysis. Ineach case, we will provide a motivation related to API migra-tion and language conversion—before we apply the analysis to ourSourceForge corpus.

Admittedly, the statistical analysis of a corpus does not directlyhelp with any specific migration project. However, the reportedanalyses as such are meaningful for single projects, too (perhapssubject to refinements). For instance, we will discuss API cov-erage below, and such information, when obtained for a specificproject, directly helps prioritizing migration efforts. In this paper,which presents early research, we take a statistical view on thecorpus to indicate the de-facto range for some of the measures ofinterest.

4.1 API footprint per project

We begin with a very simple, nevertheless informative API-usage analysis for the footprint of API usage. There are differentdimensions of footprint. Below, we consider the numbers of usedAPIs and used (distinct) API methods. In the online appendix, wealso consider the ratio of API calls to all calls. In extension of thesenumbers, we could also be interested in the ‘reference projects ×API pool’ matrix (showing for each project the combination ofAPIs that it uses).

What’s an API?

Page 6: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Namespace #T

yp

es

#M

eth

od

s

MA

Xsiz

ecla

ss

tree

MA

Xsiz

ein

terfa

ce

tree

#R

efe

ren

ced

nam

esp

aces

#R

efe

rrin

gn

am

esp

aces

%C

lasses

%In

terfa

ces

%V

alu

ety

pes

%D

ele

gate

typ

es

%G

en

eric

typ

es

%C

lass

argu

men

ts

%In

terfa

ce

argu

men

ts

%V

alu

ety

pe

argu

men

ts

%D

ele

gate

argu

men

ts

%S

eale

dcla

sses

%S

pecia

lizab

lecla

sses

%S

pecia

lizab

lety

pes

%O

rp

han

cla

sses

%O

rp

han

inte

rfa

ces

%O

rp

han

typ

es

System.Web.* 2327 29315 • • 43 • • • • • • • • • • • • • • •

System.Windows.* • • • 82 • • • • • • • • • • • • • • • • •

System.ServiceModel.* • • • • 43 • • • • • • • • • • • • • • •System.Windows.Forms.* • • • • • • • • • • • • • • • • • • • •

System.Data.* • • • • • • • • • • • • • • • • • • • • •System.Activities.* • • • • • • • • • • • • • • • • • • • • •System.ComponentModel.* • • • • • • • • • • • • • • • • • • • •System.Workflow.* • • • • • • • • • • • • • • • • • • • •System.Xml.* • • • • • • • • • • • • • • • • • • • •System.Net.* • • • • • • • • • • • • • • • • • • • •System.DirectoryServices.* • • • • • • • • • • • • • • • •System • • • • • 69 • • • • • • • • • • • • • • •

System.Security.Cryptography.* • • • • • • • • • • • • • • •Microsoft.VisualBasic.* • • 38 • • • • • • • • • • • 48 • • • • • •System.Runtime.InteropServices.* • • • • • • • • • • • • • • • • •Microsoft.JScript.* • • • • • • • • • • • • • • • • • •System.Drawing.* • • • • • • • • • • • • • • • • • • • •

System.Runtime.Remoting.* • • • • • • • • • • • 23 • • • • • • • •System.Configuration.* • • • • • • • • • • • • • • • • • • • •System.Diagnostics.* • • • • • • • • • • • • • • • • • • •System.IO.* • • • • • • • • • • • • • • • • • • • •

System.Reflection.* • • • • • • • • • • • • • • • • •

System.EnterpriseServices.* • • • • • • • • • • • • • • • •System.CodeDom.* • • • • • • • • • • • • • • • • •

System.IdentityModel.* • • • • • • • • • • • • • • • • • • •Microsoft.Build.* • • • • • • • • • • • • • • • • • •System.Management.* • • • • • • • • • • • • • • • • • • •

System.Threading.* • • • • • • • • • • • • • • • • • • • •

System.Runtime.Serialization.* • • • • • • • • • • • • • • • • • • •System.Security.AccessControl • • • • • • • • • • • • • • •

System.Security.Permissions • • • • • • • • • • • • 86 • •

System.Runtime.CompilerServices • • • • • • • • • • • • • • • • • • • •

System.Linq.* • • • • • • • • • • • 23 • • • • • • • •System.AddIn.* • • • • • • • • • • • • • • • • •System.Xaml.* • • • • • • • • • • • • • • • • • • •System.Messaging.* • • • • • • • • • • • • • • • • •Microsoft.Win32.* • • • • • • • • • • • • • • • • •System.Security.Policy • • • • • • • • • • • • • • • • •

System.Globalization • • • • • • • • • • • •Microsoft.VisualC.* • • • • • • • 100 • • • • 100 100 • •System.Transactions.* • • • • • • • • • • • • • • • • • 100 •System.Security • • • • • • • • • • • • • • •

System.Collections.Generic • • • • • • • • • • • • • • • • • •System.Runtime.DurableInstancing • • • • • • • • • • • • • • • • •

System.Collections • • • • • • • • • • • • • •System.Text • • • • • • • • • • • • • •System.Deployment.* • • • • • • • • • • • • • • •System.Runtime.Caching.* • • • • • • • • • • • • • • • • • 100 •System.ServiceProcess.* • • • • • • • • • • • •System.Resources.* • • • • • • • • • • • • • • •System.Dynamic • • • • • • • • 91 • • • • • 72 • 70System.Security.Principal • • • • • • • • • • • • • •Microsoft.SqlServer.Server • • • • • • • • • • • • • 100 •System.Security.Authentication.* • • • • • • • • • • • • • • •System.Collections.Specialized • • • • • • • • • • • • • • 100 100System.Device.Location • • • • • • • • • • • • • •System.Text.RegularExpressions • • • • • • • • • • • • 100 100 • •Accessibility • • • • • • 60 • • • • 100 100 • •System.Collections.Concurrent • • • • • • • • • • • • 100 100 • •System.Runtime.Versioning • • • • • • • • • • •

Microsoft.CSharp.* • • • • • • • • • • • •System.Collections.ObjectModel • • • • 100 100 • • • • 100 100 • •System.Runtime.ConstrainedExecution • • • • • • • • •

System.Runtime • • • • • • 100 • • •

System.Timers • • • • • 25 • • • • 100 100System.Media • • • • 100 • • • • • •

System.Runtime.Hosting • • • • 100 • • • • •

System.Runtime.ExceptionServices • • • 100 • • •

System.Numerics • • • • 100 • • •75% 190 1579 8 7 22 26 80 16 23 5 1 66 8 46 5 51 89 90 4 40 10Median 60 543 3 2 17 9 72 6 13 0 0 54 4 34 2 33 67 69 1 0 425% 18 175 0 0 11 3 60 0 6 0 0 37 1 19 0 10 46 50 0 0 0

Table IIInfographics for reuse-related metrics for .NET (See the online version for additional data.)

Each .NET namespace may

be regarded as an API

What’s an API?

Page 7: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

API domains

• Programming domains

‣ XML

‣ GUI

‣ BCE (bytecode engineering)

‣ Testing

‣ ...

• Application domains

‣ Financial exchange

‣ Human resources

‣ ...

W.l.o.g. (?), we have

been mostly addressing

programming domains

in our work.

Page 8: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

The very basics of API migration

Page 9: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

API migration = to eliminate an application's dependencies on a given API (the “original”

API) and to make it depend instead on another API (the “replacement” API).

Why to do it?

How to do it?

Page 10: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Incentives for API migration

• The replacement API is‣ more modern, or‣ more typed, or‣ more compliant, or‣ more efficient, or‣ more usable, etc.

• The original API is‣ no longer supported, or‣ too costly in terms of license costs.

• The application must use less APIs per domain.

• Help with language migration.

API asbestos

pre/post

DOM vs. X/O

MSXML ...

LINQ to XML

Page 11: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Export person list as XML document

import org.jdom.*;

public static Document makeDocument(List<Person> contacts) { Document doc = new Document(); Element root = new Element("contacts"); document.addContent(root); for (Person p: contacts) { Element px = new Element("person"); Element namex = new Element("name"); namex = namex.setText(p.getName()); px = px.addContent(namex); root = root.addContent(px); } return doc;}

XML domain

Page 12: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Export person list as XML document

import org.w3c.dom.*;import ...;

public static Document makeDocument(List<Person> contacts) { DocumentBuilder b = ... Document doc = b....; Element root = doc.getDocumentElement(); for (Person p: contacts) { Element px = doc.createElement("person"); Element namex = doc.createElement("name"); namex.setTextContent(p.getName()); px.appendChild(namex); root.appendChild(px); } return document;}

XML domain

Page 13: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

API migration by rewriting

Application API X

API Y

uses

Application

Transformation

uses

Original API

Replacement API

Page 14: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

API migration by wrapping

Application API X

API X interface

API Y

uses

Applicationuses

Original API

Replacement API

Wrapper

Page 15: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

An application in need of API migrationfrom Vector to ArrayList

(1) public class Wizard { public class Wizard {(2) private static JComboBox selectionBox; private static JComboBox selectionBox;(3) private static Hashtable h1 = new Hashtable(); /* A1 */ private static HashMap h1 = new HashMap();

(4) private static Vector v2; private static List v2;(5) public static void main(String[] args) { public static void main(String[] args) {(6) for (int count = 0; count < 5; count++) for (int count = 0; count < 5; count++)(7) h1.put(new Integer(count), h1.put(new Integer(count),(8) "Item " + Integer.toString(count)); "Item " + Integer.toString(count))(9) Vector v1 = new Vector(); /* A2 */ Vector v1 = new Vector();(10) retrieveData(v1); retrieveData(v1);(11) v2 = runWizard(v1); v2 = runWizard(v1);(12) Object[] dbData = new Object[v2.size()]; Object[] dbData = new Object[v2.size()];(13) v2.copyInto(dbData); Util.copyInto(v2, dbData);

(14) writeToDatabase(dbData); writeToDatabase(dbData);(15) } }(16) private static Vector runWizard(Vector v3) { private static List runWizard(Vector v3) {(17) Vector v4 = new Vector(); /* A3 */ List v4 = Collections.synchronizedList(new ArrayList());

(18) for (int sel = showWizardDialog(v3); for (int sel = showWizardDialog(v3);(19) sel >= 0;) { sel >= 0;) {(20) v4.addElement(v3.elementAt(sel)); v4.add(v3.elementAt(sel));(21) sel = showWizardDialog(v3); sel = showWizardDialog(v3);(22) } }(23) System.out.println("User Selections:"); System.out.println("User Selections:");(24) printElements(v4.elements()); printElements(v4.iterator());(25) return v4; return v4;(26) } }(27) private static void printElements(Enumeration e1) { private static void printElements(Iterator e1) {(28) while (e1.hasMoreElements()) while (e1.hasNext())(29) System.out.println(e1.nextElement()); System.out.println(e1.next());(30) } }(31) private static void retrieveData(Vector v5) { private static void retrieveData(Vector v5) {(32) printElements(h1.elements()); printElements(h1.values().iterator());(33) for (Enumeration e2 = h1.elements(); for (Iterator e2 = h1.values().iterator();(34) e2.hasMoreElements();) e2.hasNext();)(35) v5.addElement(e2.nextElement()); v5.addElement(e2.next());(36) } }(37) private static int showWizardDialog(Vector v6) { private static int showWizardDialog(Vector v6) {(38) selectionBox = new JComboBox(v6); selectionBox = new JComboBox(v6);(39) selectionBox. selectionBox.(40) addActionListener(new MyActionListener(v2)); addActionListener(new MyActionListener(v2));

· · · · · ·(41) } }(42) static void writeToDatabase(Object[] data) { · · · } static void writeToDatabase(Object[] data) { · · · }(43) } }(44) public class MyActionListener implements ActionListener public class MyActionListener implements ActionListener(45) { {(46) private Vector v7; private List v7;(47) public MyActionListener(Vector v6) {v7 = v6;} public MyActionListener(List v6) {v7 = v6;}

· · · · · ·(48) } }

(a) (b)

Figure 1: (a) Example program that uses the legacy classes Vector, Hashtable, and Enumeration. (b) Theexample program after migrating from Vector to ArrayList, Hashtable to HashMap, and Enumeration to Iterator.Underlining indicates changed code fragments. A1, A2, and A3 denote allocation sites in the program.

for a given object o, our approach is to insert a synchro-nization wrapper around o. This is accomplished using aninstance of the Decorator design pattern [15] where a col-lection type is implemented by delegating all operations too, and where thread-safety is added by making all the for-warding methods synchronized. The synchronization wrap-pers that we use are provided by the standard library classjava.util.Collections exactly for this purpose.

Our work was implemented as a refactoring [13, 22] inEclipse (see www.eclipse.org), a widely used open-sourcedevelopment environment for Java, using an existing refac-toring infrastructure [2]. Thus far, our experiments haveconcentrated on migrations involving classes from the JavaStandard Collections Framework, but our techniques canalso be used for migrations involving other library classes,

legacy class Hashtable to its unsynchronized replacementHashMap.

and for migrations between user-defined classes. At present,the framework requires that the migration of method callsdoes not change the exceptions that may be thrown, andmigrations are simply disallowed if this is not the case. Thishas not posed any problems in practice, because we did notencounter situations that required nontrivial migrations ofexception types while conducting our experiments. Section 2presents a simple workaround mechanism that is su!cientto handle most cases of migrations between methods thatdi"er in the thrown exceptions. In the long term, it doesnot seem especially di!cult to support the migration of ex-ception types, as is discussed in Section 7. Furthermore, weexpect that handling other statically typed object-orientedlanguages (e.g., C#) would only require minor modificationsof the framework.

We evaluated our technique on a number of moderate-sized Java applications in which we migrated a number of

266

I. Balaban, F. Tip, and R. Fuhrer: Refactoring Support for Class Library Migration. OOPSLA 2005.

Collections domain

Page 16: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

API migration by rewritingExample: Vector to ArrayList

(1) new Vector(), unsynchronized ! new ArrayList()(2) new Vector(), synchronized ! Collections.synchronizedList(new ArrayList())(3) int Vector:receiver.size() ! int receiver.size()(4) Object Vector:receiver.firstElement() ! Object receiver.get(0)(5) Object Vector:receiver.setElementAt(Object: value, int: index) ! Object receiver.set(index, value)(6) void Vector:receiver.copyInto(Object: array) ! void Util.copyInto(receiver, array)(7) Enumeration Vector:receiver.elements() ! Iterator receiver.iterator()(8) new Hashtable(), unsynchronized ! new HashMap()(9) new Hashtable(), synchronized ! Collections.synchronizedMap(new HashMap())(10) Object Hashtable:receiver.put(Object: key, Object: value) ! Object receiver.put(key, value)(11) Enumeration Hashtable:receiver.elements() ! Iterator (Collection receiver.values()).iterator()

Figure 2: Specification used for migrating the example program of Figure 1.

heavily used legacy classes from the standard collection li-braries. Specifically, we considered migration from Vectorto ArrayList, Hashtable to HashMap, and Enumeration toIterator. In these benchmarks, our tool could automati-cally migrate an average of 90.4% of the declarations thatrefer to legacy classes, 96.6% of the call sites, and 92.0%of the allocation sites. Moreover, for the migrated alloca-tion sites, only a small fraction needed to be surroundedwith synchronization wrappers. Code inspection revealedthat the remaining occurrences of legacy classes could notbe migrated due to interaction with external libraries suchas Swing.

The remainder of this paper is organized as follows. Sec-tion 2 presents a motivating example that illustrates ourapproach. Section 3 presents the model of type constraintsthat will be used to determine where migrations are allowed,and Section 4 describes how type constraints are solved. Sec-tion 5 discusses the evaluation of our approach on a set ofmoderate-sized benchmarks. Related work is discussed inSection 6. Finally, conclusions and directions for future workare presented in Section 7.

2. MOTIVATING EXAMPLEFigure 1(a) shows an example program that implements

a simple “wizard” using a number of components from theSwing class library. The program displays a list of values in adrop-down menu, and prompts the user to select one or moreof these. Then, the selected values are written to a databasevia a call to the method writeToDatabase(Object[]), thedefinition of which is omitted. Our goal is to transformthis program so that it uses ArrayList instead of the legacyclass Vector, HashMap instead of Hashtable, and Iteratorinstead of Enumeration. Figure 1(b) shows the desired refac-tored program.ArrayList was introduced in the standard libraries to re-

place Vector, and is considered preferable because its inter-face is minimal and matches the functionality of the Listinterface. More importantly, it allows for unsynchronizedaccess to a list’s elements whereas all of Vector’s meth-ods are synchronized3 which results in unnecessary over-head when Vectors are used by only one thread. Similarly,HashMap was introduced to replace Hashtable and allowsfor unsynchronized access to a hash-table’s elements. More-over, HashMap allows for null values and it does not containany of Hashtable’s methods that produce Enumerations.The Iterator interface that replaces Enumeration features

3As noted in e.g., [6, 1, 27], synchronization operations inJava programs may incur significant performance overhead.

public class Util {public static void copyInto(ArrayList v,

Object[] target) {if (target == null)

throw new NullPointerException();for (int i = v.size() - 1; i >= 0; i--)

target[i] = v.get(i);}

}

Figure 3: Auxiliary class that contains a methodused for migrating Vector.copyInto().

shorter method names and supports an additional remove()method for the safe removal of elements during iteration.

Figure 2 shows a migration specification, as would be writ-ten by a user, for migrating the example program. This spec-ification consists of a set of rewrite rules that express howallocation sites and method calls on legacy classes4 Vector,Hashtable and Enumeration can be rewritten. For example,rule (3) states that migrating calls to method Vector.size()requires no modification, since ArrayList defines a syn-tactically and semantically identical method. If methodsin the legacy class are not supported by the replacementclass, rewriting method calls becomes more involved. Forexample, Vector supports a method firstElement() notdefined in ArrayList. Rule (4) states that a call receiver.firstElement(), where receiver is an expression of typeVector, should be transformed into receiver.get(0). Incases where it is not possible to express the e!ect of amethod call on the legacy class in terms of calls to methodson the replacement class, calls may be mapped to methodsin user-defined classes. For example, Vector has a methodcopyInto() that copies the contents of a Vector into an ar-ray. Since ArrayList does not provide this functional-ity, rule (6) transforms receiver.copyInto(array) intoa call to method Util.copyInto(receiver, array) of theUtil class shown in Figure 3. This strategy can also beused to migrate between methods that throw di!erent typesof exceptions. Specifically, a user-defined class method canbe used to wrap a method in a replacement class in ordertranslate between exception types. Finally, when an opera-tion in a legacy class that is not supported by a replacementclass cannot be modeled using an auxiliary class, migrationmay become impossible.

Rules (1) and (2) are both concerned with rewritingallocation sites of the form new Vector(). The formerapplies in cases where thread-safety need not be pre-

4In this discussion we use the term class to refer to Javaclasses as well as to interfaces.

267

I. Balaban, F. Tip, and R. Fuhrer: Refactoring Support for Class Library Migration. OOPSLA 2005.

Rewriting rules

Type mapping• Vector → ArrayList• Enumeration → Iterator

Collections domain

Page 17: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

T. T. Bartolomei and K. Czarnecki and R. Lämmel: Swing to SWT and Back: Patterns for API Migration by Wrapping. ICSM 2010.

II. BASICS OF API WRAPPING

A. Terminology

We say that we migrate from the source API to the targetAPI. Here source API refers to the original API that is to bereplaced, and target API refers to the API that is to be re-usedin re-implementing the source API. The term surrogate is usedfor a type that is meant to re-implement a type of the sourceAPI while preserving its interface. A wrapper is a collectionof such surrogates and possibly internal helper types.

B. The tailored ADAPTER pattern

The ADAPTER design pattern is the classic solution tointegrate a client class with types that have different interfacesthan it expects. The GoF [5] book describes two variants ofadapters: class and object. In both cases, an adapter classextends the type expected by the client in order to intercept re-quests and redirect them to the new target (the adaptee). Classadapters integrate the adaptee through inheritance whereasobject adapters use object composition.

In the context of API migration, the adapter is the surrogateobject, and the adaptee is an object of the target API. Surro-gates do not extend the expected types, i.e., the classes of thesource API; instead they replace them. Thereby the constructorcalls in the client application can also be retained. In principle,the surrogates may even populate the same namespace as thesource API. For generality’s sake, and in order to avoid issueswith multiple inheritance, surrogates may prefer the objectadapter over the class adapter.

C. A simple example

Let us migrate applications using Java Vectors toArrayLists—a simple scenario borrowed from [6]. In thesource code of Fig. 1, we re-implement Vector (line 1).Note that we preserve the superclass AbstractList. TheVector surrogate acts as an object adapter, maintaining theadaptee in an ArrayList field (l.2). Both objects haveassociated lifecycles: an ArrayList object is created in theconstructor of a Vector (l.4). Because source and target APIsare very similar in this example, most of the client requestscan be simply delegated to the adaptee (as in l.7) while somesemantic adjustments are encapsulated in the adapter (l.9-12).

As we migrate clients from Vector to ArrayList,we also need to migrate them from Enumerations toIterators. Since Enumeration is a Java interface, wesimply copy it to the wrapper (not shown in the figure). Wealso need to provide a concrete implementation of the interfacethat acts as an adapter to Iterator and can be used byother types of the wrapping layer. For instance, Vector’selements() method (l.13-15) creates a new adapter aroundthe iterator returned by its ArrayList adaptee.

III. STUDY DESIGN

A. Methodology

Our study is designed to answer these research questions:

1 public class Vector extends AbstractList implements ... {2 ArrayList adaptee;3 public Vector() {4 adaptee = new ArrayList();5 }6 public void add(Object o) {7 adaptee.add(o);8 }9 public void setSize(int ns) {

10 while (adaptee.size() < ns) adaptee.add(null);11 while (adaptee.size() > ns) adaptee.remove(ns);12 }13 public Enumeration elements() {14 return new EnumerationImpl(adaptee.iterator());15 }16 ...17 }18 public class EnumerationImpl implements Enumeration {19 Iterator adaptee;20 EnumerationImpl(Iterator i) { this.adaptee = i; }21 public Object nextElement() { return adaptee.next(); }22 ...23 }

Fig. 1. Vector and Enumeration adapters.

What are the design challenges faced by developers

when implementing wrapping layers around OO

APIs? What are the solutions employed in practice?

The study is primarily based on two major open-sourcewrapping projects implementing both directions betweenJava’s GUI APIs Swing and SWT. Furthermore, the study takesinto account our previous work on a wrapper in the XMLdomain [4].

The study was conducted in two steps. The first stepconsisted in uncovering important design challenges facedby developers; see §IV. We initially studied the APIs anddetected some incompatibilities that would potentially impactthe wrapper design. We then talked to the developers aboutthe general structure of their wrappers and the main challengesthey experienced. Finally, we investigated the wrappers’ sourcecode trying to understand the correspondences between typesof the source and target APIs.

The second step comprised understanding the solutions forthe identified challenges and eventually abstracting solutionsas design patterns; see §V. To this end, we performed archi-tectural code queries and manual inspection on source code.Our investigation resulted in 5 design patterns. Finally, wedesigned simple metrics to provide evidence that the patternsare indeed applied in practice; see §VI.1

B. Subjects

Table I presents the wrappers and APIs that are subjectsof our study. For wrappers, we indicate the API they replace(source API) and the one they use instead (target API). Forthe APIs, we show the studied version, used by the wrappers.All sub-packages of top-level packages were included.

The central subjects are the GUI wrappers SwingWTand SWTSwing, but we also leveraged our XML wrapper

1Supporting material, including the source code for every example in thispaper can be found at http://gsd.uwaterloo.ca/icsm2010/. The distribution alsocontains additional wrapper implementations that were built to further validatethe discussion of §IV and §V.

API migration by wrappingExample: Vector to ArrayList

Collections domain

Page 18: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

T. T. Bartolomei, K. Czarnecki, R. Lämmel, and Tijs van der Storm: Study of an API migration for two XML APIs. SLE 2009.

API migration by wrappingExample: XOM’s to JDOM’s Element

ProsClient code does not need any transformation.Existing test harness directly applies.

Cons/LimitationsThe original API’s interface continues to be used.Runtime overhead because of wrapping.Defensive implementation for compliance.

are reused (cloned) by the re-implementation, as is. Foreach class of the source API, we create a new classwhose interface exactly mirrors the interface of the sourceclass — modulo systematic renaming of all class names inmethod signatures. The implementations of all features are“empty”:

// An ”empty” feature implementationpublic int feature(int arg) {

throw new RuntimeException();

}

(In fact, the empty API re-implementation is easily ob-tained by simple IDE techniques.) The empty API imple-mentation is compilable by construction. Also, any clientcode can be redirected to the empty API and will be com-pilable. (Redirection is trivially achieved by replacementof the API’s JAR, by aspect-oriented programming, or bymanually changing package references.)

The next step is to turn the empty types into proper wrap-per types. Here we systematically apply the design patternfor object adapters [10] where we implement the API map-ping (c.f., §4.1) as follows. Each wrapper class (i.e., eachclass of the re-implementation of the source API) is set up,if possible, as an object adapter with an object of the targetAPI as the adaptee (also called the wrappee). For instance,the different Element types of XOM and JDOM would en-gage in the corresponding wrapper class as follows:

package nu.xom;

public class Element {

private org.jdom.Element wrappee;

// implement interface of wrapper in terms of wrappee}

A few special cases should be mentioned in passing.First, abstract wrapper types may not need any wrappeetype. Second, when we implement the wrapper class fora source type with multiple associated target types, then animprecise upper bound such as Object may need to be cho-sen for the wrappee type, and methods may need to performtype dispatch (c.f., instanceof etc.) to invoke methods onthe wrappee.

We speak of a minor wrapping disorder if a singlewrappee object is not sufficient for re-implementation, butadditional state must be provided on top of the wrappee.This could happen if the source API intrinsically assumes aricher state than the target API. For instance, a wrapper-based re-implementation of DOM in terms of XOM orJDOM would need to maintain extra state in order to pro-vide an event system; c.f., §2.6. We note that even a regularsource type may have a disorder because wrapping disordersare implementation-level complications that are not coveredby the domain-level API mapping.

We speak of a major wrapping disorder if feature invo-cations on the source API (handled by the wrapper) can-not be translated immediately to feature invocations on the

wrappee. This happens when a target feature requires fun-damentally more information than available through themethod signature of the source feature. For instance, awrapper-based re-implementation of XOM or JDOM interms of DOM would fail because XOM/JDOM’s construc-tors are not implementable in terms of DOM’s factory meth-ods whose invocation requires the document factory to beavailable; c.f., §2.2.

The XOM/JDOM case study does not involve any wrap-ping disorder.

4.3. Levels of adaptation

Ir-/regularity is solely based on the existence of a coun-terpart target feature for a given source feature. There isa richer scale of adaptation levels that usefully classifiesre-implemented features. In the following, we define thedifferent adaptation levels for a given source feature f :

Adaptation level 1 f is a regular feature with f � as the as-sociated target feature. The re-implementation of f onlyperforms basic delegation of f to f � on the wrappee. Thatis, arguments of f (if any) may be unwrapped before theyare passed to f �; the result of f � (if any) may be wrapped be-fore it is returned through f . Argument positions may alsobe filled in by defaults. Mediation between this-returningand void methods is also covered, c.f., §2.1.

Adaptation level 2 Additional adaptations are involved incomparison to level 1. That is, arguments may be pre-processed (converted or checked); results may be post-processed (converted or checked); exception handling maytranslate exceptions (to replace exception types) and ad-just exception-throwing behavior (to trade error codes forexceptions or v.v.); the delegation may also be conditionalsubject to simple tests of the arguments. See §2.7 and §2.5for API differences that require such adaptations.

Adaptation level 3 f is an irregular feature. Its implemen-tation may invoke any number of target features, but withoutre-implementing any functionality of the target API. In in-formal terms, a level 3 feature is one that is effectively miss-ing in the target API but which can be recomposed fromother features of the target API.

Adaptation level 4 The level 3 condition of “not re-implementing any features of the target API” must be vio-lated. In informal terms, level 4 features violate the “inten-tion of reuse” for re-implementing the source API in termsof the target API.

Table 7 shows the features per type and adaptation levelfor the case study. (We have identified these levels intel-ligently and recorded them through annotations on the re-implementation of XOM.) As the table shows, basic delega-tion (level 1) suffices for more than half of all features; more

XML domain

Page 19: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

The challenge ofAPI differences

Page 20: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Different XML APIs

• Some in-memory APIs

‣ www.w3.org/DOM

‣ www.jdom.org

‣ www.dom4j.org

‣ xom.nu

XML domain

Page 21: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Different kinds of XML APIs

������������ ������

��������� ������

��������������� ����

DOM & Co.

JAXB & Co.

SAX & Co.

XML domain

Page 22: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

API differences

API Y API X

• Different features (capabilities)

• Different contracts (pre/postconditions)

• Different protocols (order/data-flow constraints)

• Different type mappings across type hierarchies

• ...

Any domain

Page 23: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Type mapping Swing : SWT

Component

Container

Window

Frame

JFrameJComponent

JList

AbstractButton

JToggleButton

JCheckBoxJButton

JViewPort JScrollPane

Widget

Control

Scrollable Button

List

Composite

Canvas

Decorations

Shell

Fig. 4. One option for Swing and SWT Type Mappings.

in JFrame, and must be re-implemented by the surrogate.

C. Mapping varying type hierarchies

Fig. 4 shows an excerpt of Swing and SWT’s class hier-archies, with the type mappings conjectured in the previoussection. Further, Component is mapped to Widget sincethey are the super-types of all widgets in their correspondingAPIs. Also, Container is mapped to Composite becausethey represent widgets that can have child widgets in the senseof the COMPOSITE design pattern [5]. For List’s case of aComposite Target we use a directed edge to JScrollPaneto express the uni-directionality of this mapping. Finally, wenote that the mappings of the open-source wrappers SwingWTand SWTSwing are slightly different from Fig. 4.

Wrappers should provide surrogates that completely reflectthe inheritance hierarchy of the source API for two mainreasons. First, client applications can reference and extendany visible type of the source API. Clients using surrogatesthat do not comply with the hierarchy may not even compile.For instance, if Swing2SWT exposed JList without exposingits super-types, clients such as Translator (l.36), whichreferences JList’s super-type Container, would break.

Second, wrappers can take advantage of the source API’sinheritance decomposition to reuse wrapping code. For in-stance, JList and JButton have Component as a com-mon super-type, and could reuse its methods. This kind ofreuse, however, can only be achieved if type mappings respectcovariance. Fig. 4 shows that Component→Widget andContainer→Composite are covariant mappings. Hence,Swing2SWT can implement Component methods using aWidget adaptee, and Container will be able to reusethese methods since its adaptee, Composite, is a Widget.Breaking covariance indicates a potentially serious API mis-match. In Swing, for example, buttons extend Containerand can, therefore, have child widgets. Since SWT’s Buttondoes not extend Composite Swing clients that attach widgetsto buttons cannot be easily migrated to SWT.

Different functional decomposition of types in the inher-itance hierarchy can lead to additional challenges, besidesbroken covariance. In SWT, for example, a widget becomesscrollable through inheritance, by extending Scrollable,whereas Swing uses composition with JScrollPane. Finergrained functionality can also be offered at different lev-els in the hierarchy. SWT’s Control.pack() method,called in the shell object (l. 29), for instance, maps toWindow.pack() in Swing. Since Control maps toComponent, the surrogate must verify that the adaptee isactually a Window before delegating the method call.

D. Varying creation and wiring protocols

Swing provides distinct methods for creating widgets andwiring them to parents. Thus, Swing clients can dynami-cally create and modify their compositions. In contrast, SWTenforces the parent-child relationship on constructors. Thus,clients are more constrained.

Migrating from an API that is relaxed in how it establishesobject relationships to one that enforces relationships at con-struction time presents a challenge to wrappers. Surrogatesmay need to delay adaptee construction. This also means thatoperations cannot be delegated as usual; they may need to bere-implemented. Mapping from a strict to a relaxed API doesnot present major challenges. The surrogate constructors mustcreate the adaptee and wire it to its parent.

For example, when a surrogate for JCheckBox is instanti-ated (l.13), it cannot immediately construct the correspondingSWT Button adaptee because the parent widget is unknownuntil the checkbox is wired to the content pane (l.29).

E. Inversion of control

Wrappers may need to delegate control from the target APIback to the application. A simple and common scenario isthat an application receives control from the wrapper throughcallbacks, by implementing suitable interfaces or extendingsuitable classes of the source API. When there is a correspon-dence between callbacks of both APIs, then one can delegatethe events generated by the target API to callbacks designedfor the source API.

For instance, our Swing application registers anActionListener to a JButton (l.18-23), whichcorresponds to adding a SelectionAdapter to an SWT’sButton. In Swing2SWT, hence, the JButton surrogatemust register a SelectionAdapter to its Buttonadaptee and keep the ActionListener. When SWTgenerates a SelectionEvent, it must be translatedinto an ActionEvent and delegated to the originalActionListener. SWT2Swing, naturally, will perform theopposite mapping.

In general, applications may override any visible instancemethod of the source API—giving rise to a major challenge.A relatively regular case is when the overridden method hasa counterpart in the target API. In this case, wrappers mustintercept calls to the method of the target API by somemeans, e.g., by using instrumented subclasses of the target

GUI domain

Page 24: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Differences regarding XML APIs

• Think of constructing, querying, updating XML.

• Let’s see how this is done with different APIs.

• How hard would it be to do API migration?

<order> <product>4711</product> <customer>1234</customer> ...</order>

Page 25: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Variations on features: listeners

// DOM -- registration of a listener for node insertion((EventTarget)order).addEventListener( "DOMNodeInserted", // mutation type new EventListener() { public void handleEvent(Event evt) { // ... handle event ... } }, false);

JDOM (and others) do not provide an event system.

Page 26: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Variations on construction:void methods versus method chaining

// JDOM -- construction by method chainingElement order = new Element("order"). addContent(new Element("product"). addContent("4711")). addContent(new Element("customer"). addContent("1234"));

// XOM -- construction by void methodsElement order = new Element("order");Element product = new Element("product");product.appendChild("4711");order.appendChild(product);Element customer = new Element("customer");customer.appendChild("1234");order.appendChild(customer);

Page 27: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

More variations on construction:constructors versus factory methods

// XOM -- construction with constructorsElement order = new Element("order");Element product = new Element("product");product.appendChild("4711");order.appendChild(product);Element customer = new Element("customer");customer.appendChild("1234");order.appendChild(customer);

// DOM -- construction with factory methodsElement order = doc.createElement("order");Element product = doc.createElement("product");product.appendChild(doc.createTextNode("4711"));order.appendChild(product);Element customer = doc.createElement("customer");customer.appendChild(doc.createTextNode("1234"));order.appendChild(customer);

Page 28: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Variations on update:id-based versus index-based update

// XOM -- id-based updateorder.replaceChild(oldProduct, newProduct);

// JDOM -- index-based updateint index = order.indexOf(oldProduct);order.setContent(index, newProduct);

Page 29: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Variations on preconditions:liberal versus strict checking

// JDOM -- liberal checking of preconditionsorder.removeContent(product); // properly removes.order.removeContent(product); // quietly completes.

// XOM -- strict checking of preconditionsorder.removeChild(product); // properly removes.order.removeChild(product); // throws!

Page 30: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Variations on query liveness:comatose versus live query results

// XOM -- legal detachment due to comatose query resultsElements es = order.getChildElements();for (int i=0; i<es.size(); i++) es.get(i).detach();

// XOM -- illegal detachment due to life query resultsfor (Object k : order.getChildren()) ((Element)k).detach();

Detach all children of

the order element.

Page 31: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Variations on query liveness:make live query results comatose

Detach all children of

the order element.

// JDOM -- detachment loop with up-front snapshotObject[] es = order.getChildren().toArray();for (Object k : es) ((Element)k).detach();

Page 32: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Variation on semantics:detachment versus cloning of structure

// JDOM -- detach and reparentproduct.detach(); // detach product from order1order2.addContent(product); // order2 is new parent

// DOM -- clone and reparentorder2.addContent( // doc2 is the document of order2 doc2.importNode( // clone product; associate clone with doc2 product,true)); // true is for deep cloning

Move product from

order1 to order2.

Page 33: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

The utility ofAPI (usage) analysis

Page 34: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

A trivial application using the XML API XOM

Page 35: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Complexity of the application

1 package4 classes

Size of inner circle = LOCColor = # Methods(Fields are ignored.)Thanks are due to Victor Winter / SHIFT Lab @ UNO.

Page 36: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Class-to-class dependencies

White = application classesGray = API types

Thanks are due to Victor Winter / SHIFT Lab @ UNO.

Page 37: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

APIs used by the application

$ more .classpath <?xml version="1.0" encoding="UTF-8"?><classpath> ... <classpathentry kind="lib" path="xom/xom-1.2.6.jar"/></classpath>

Page 38: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

A non-trivial application using the XML API XOM

Page 39: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Complexity of the application

Top-level circles = packagesInner circles = classes

Size of inner circle = LOCColor = # Methods(Fields are ignored.)Thanks are due to Victor Winter / SHIFT Lab @ UNO.

Page 40: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

API footprint in the application

Top-level circles = packagesInner circles = classes

Size of inner circle = LOCColor = # API-type references

(Fields are ignored.)Thanks are due to Victor Winter / SHIFT Lab @ UNO.

Page 41: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

One application class with API usage

Orange = application entitiesGray = API entities

API references are relatively localized in one method.

Thanks are due to Victor Winter / SHIFT Lab @ UNO.

Page 42: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

API coverage by the application

Top-level circles = packagesInner circles = classes

Size of inner circle = LOCColor = # API-type references

Thanks are due to Victor Winter / SHIFT Lab @ UNO.

Page 43: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

APIs used by the application

$ pwd

/Users/laemmel/101companies/shiftlabAnalyses/input/src/cdk/jar

$ lsJRI.jar commons-cli-1.0.jar jgrapht-0.6.0.jar xalan.jar xpp3-1.1.4c.jarLICENSE dtd-xercesImpl.jar jniinchi-0.5.jar xercesImpl-2.9.0.jarantlr.jar jama-1.0.2.jar log4j.jar xml-apis.jarcmlxom-2.5-b1.jar javacc.jar sjava-0.68.jar xom-1.1.jarcmlxom-2.5-b1.jar-orig jaxen.jar vecmath1.2-1.14.jar xom-1.2.jar

Page 44: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

API usage analysis of a SourceForge-based Java corpus

(Oct 2008)

• 6K+ projects from SourceForge

• 1.5K projects built (200k classes)

• 60 reference projects

• Pool of 77 known APIs aggregated

• Additional API packages detected automatically

Page 45: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Many APIs per project

The APIs or API methods used in a project provide insight into

the API-related complexity of the project. In fact, such footprint-

like data serves as a proxy for the API dependence or platform de-

pendence of a project. In [16], we mention such API dependence

as a form of software asbestos. In the following, we simply count

the number of APIs used in a project as a proxy for the difficulty of

API migration. Ultimately, a more refined analysis is needed such

that specific (known to be difficult) API combinations are counted,

and attention is payed to the status of whether these API combos

are really exercised in one program scope or only separately.

In this context, we need to define what constitutes usage of

an API. One option would be to count each method call with an

API’s type as static receiver type (in the case of an instance call),

or as the hosting scope (in the case of a static call), or as the con-

structed type (in the case of a constructor call). Another option

is to count any sort of reference to an API’s types (including the

aforementioned positions of API types in method calls, but count-

ing additionally local variable declarations or argument positions

of method declarations and method calls). Yet another option is

to consider simply imports in a project. The latter option has the

advantage that we can measure such imports very easily—even

for unbuilt projects. Indeed, the following numbers were obtained

by counting imports that were obtained with the token-based fact

extractor.

0

5

10

15

20

25

30

1 10 100 1000 10000 1e+05 1e+06 1e+07

Nu

mb

er

of

use

d k

no

wn

AP

Is

Project Size (in NCLOC)

Unbuilt projectsBuilt projects

Reference projects

Projects Min 1st Q Median Mean 3rd Q MaxUnbuilt 1 2 4 4.409 6 27

Built 1 3 4 4.692 6 23

Reference 1 4 6 6.937 8 20

Figure 2. Numbers of known APIs used in the projects;

reference projects are plotted on top of built projects which

in turn are plotted on top of unbuilt projects.

Figure 2 shows the number of known APIs (y-axis) that are

used in the projects ordered by NCLOC-based project size (x-

axis). Unbuilt, built, and reference projects are distinguished. The

listed maxima and quartiles give a sense of the API footprint in

projects in the wild. The set of unbuilt projects exercises a higher

maximum of used APIs than the set of built projects—potentially

because of a correlation between the complexity of projects in

terms of the number of involved APIs and the difficulty to build

those projects.

We also need to clarify how to measure usage of API meth-

ods. That is, how to precisely distinguish distinct methods so that

counting uses is well defined. Particularly, in the case of instance

method calls, the situation is complicated due to inheritance, over-

riding, and polymorphism. As a starting point, we may distinguish

methods by possible receiver type—no matter whether the method

is overridden or inherited at a given subtype. Then, a method call

is counted towards the static receiver type in a call. Addition-

ally, we may also count the call towards subtypes (subject to a

polymorphism-based argument: the runtime receiver type may be

a subtype) and supertypes (subject to an inheritance-based argu-

ment: the inherited implementation may be used, if not inherited).

Such inclusion could also be made more precise by a global pro-

gram analysis.

1

10

100

1000

10000

100000

1 10 100 1000 1e+04 1e+05

Nu

mb

er

of

dis

tinct

AP

I m

eth

od

s

Project size (in MC)

Projects Min 1st Q Median Mean 3rd Q MaxAll 1 94 199.5 370.7 423 10850

Reference 20 305.8 611 866.2 948.8 5351

Figure 3. Numbers of distinct API methods used in the

projects (without distinguishing APIs).

Figure 3 shows the numbers of distinct API methods used in

the built projects of the corpus; reference projects are highlighted.

Methods on sub- and supertypes of static receiver types were not

included. For simplification, we also considered overloaded meth-

ods as basically one method.

There is a trend of increasing API footprint with project size.

Both axes are logarithmic, but project size grows more quickly

than the count of distinct API methods. Most projects, even most

of the largest ones, use less than 1,000 distinct API methods. As

the table with maxima and quartiles shows, there are a few projects

with exceptionally high counts. We have verified for these projects

that they essentially implement or test large frameworks (such as

ofbiz.apache.org). That is, these outliers embody large num-

bers of ‘self-calls’ for a large number of API methods.

4.2 API coverage by the corpusAn important form of API-usage analysis concerns API cover-

age; see, for example, the discussion of coverage in the API mi-

gration project of [3]. That is, coverage information is helpful in

API migration as means to prioritize efforts, and to leave out map-

ping rules for obscure parts of the API. Coverage information is

also helpful in improving API usability [13, 12].

As it is the case with other forms of API-usage analysis, API

coverage may be considered for either a specific project, or, cu-

mulatively, for all projects in a corpus. For instance, for any given

Page 46: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Many distinct API methods

The APIs or API methods used in a project provide insight into

the API-related complexity of the project. In fact, such footprint-

like data serves as a proxy for the API dependence or platform de-

pendence of a project. In [16], we mention such API dependence

as a form of software asbestos. In the following, we simply count

the number of APIs used in a project as a proxy for the difficulty of

API migration. Ultimately, a more refined analysis is needed such

that specific (known to be difficult) API combinations are counted,

and attention is payed to the status of whether these API combos

are really exercised in one program scope or only separately.

In this context, we need to define what constitutes usage of

an API. One option would be to count each method call with an

API’s type as static receiver type (in the case of an instance call),

or as the hosting scope (in the case of a static call), or as the con-

structed type (in the case of a constructor call). Another option

is to count any sort of reference to an API’s types (including the

aforementioned positions of API types in method calls, but count-

ing additionally local variable declarations or argument positions

of method declarations and method calls). Yet another option is

to consider simply imports in a project. The latter option has the

advantage that we can measure such imports very easily—even

for unbuilt projects. Indeed, the following numbers were obtained

by counting imports that were obtained with the token-based fact

extractor.

0

5

10

15

20

25

30

1 10 100 1000 10000 1e+05 1e+06 1e+07

Num

ber

of use

d k

now

n A

PIs

Project Size (in NCLOC)

Unbuilt projectsBuilt projects

Reference projects

Projects Min 1st Q Median Mean 3rd Q MaxUnbuilt 1 2 4 4.409 6 27

Built 1 3 4 4.692 6 23

Reference 1 4 6 6.937 8 20

Figure 2. Numbers of known APIs used in the projects;

reference projects are plotted on top of built projects which

in turn are plotted on top of unbuilt projects.

Figure 2 shows the number of known APIs (y-axis) that are

used in the projects ordered by NCLOC-based project size (x-

axis). Unbuilt, built, and reference projects are distinguished. The

listed maxima and quartiles give a sense of the API footprint in

projects in the wild. The set of unbuilt projects exercises a higher

maximum of used APIs than the set of built projects—potentially

because of a correlation between the complexity of projects in

terms of the number of involved APIs and the difficulty to build

those projects.

We also need to clarify how to measure usage of API meth-

ods. That is, how to precisely distinguish distinct methods so that

counting uses is well defined. Particularly, in the case of instance

method calls, the situation is complicated due to inheritance, over-

riding, and polymorphism. As a starting point, we may distinguish

methods by possible receiver type—no matter whether the method

is overridden or inherited at a given subtype. Then, a method call

is counted towards the static receiver type in a call. Addition-

ally, we may also count the call towards subtypes (subject to a

polymorphism-based argument: the runtime receiver type may be

a subtype) and supertypes (subject to an inheritance-based argu-

ment: the inherited implementation may be used, if not inherited).

Such inclusion could also be made more precise by a global pro-

gram analysis.

1

10

100

1000

10000

100000

1 10 100 1000 1e+04 1e+05

Nu

mb

er

of

dis

tinct

AP

I m

eth

od

s

Project size (in MC)

Projects Min 1st Q Median Mean 3rd Q MaxAll 1 94 199.5 370.7 423 10850

Reference 20 305.8 611 866.2 948.8 5351

Figure 3. Numbers of distinct API methods used in the

projects (without distinguishing APIs).

Figure 3 shows the numbers of distinct API methods used in

the built projects of the corpus; reference projects are highlighted.

Methods on sub- and supertypes of static receiver types were not

included. For simplification, we also considered overloaded meth-

ods as basically one method.

There is a trend of increasing API footprint with project size.

Both axes are logarithmic, but project size grows more quickly

than the count of distinct API methods. Most projects, even most

of the largest ones, use less than 1,000 distinct API methods. As

the table with maxima and quartiles shows, there are a few projects

with exceptionally high counts. We have verified for these projects

that they essentially implement or test large frameworks (such as

ofbiz.apache.org). That is, these outliers embody large num-

bers of ‘self-calls’ for a large number of API methods.

4.2 API coverage by the corpusAn important form of API-usage analysis concerns API cover-

age; see, for example, the discussion of coverage in the API mi-

gration project of [3]. That is, coverage information is helpful in

API migration as means to prioritize efforts, and to leave out map-

ping rules for obscure parts of the API. Coverage information is

also helpful in improving API usability [13, 12].

As it is the case with other forms of API-usage analysis, API

coverage may be considered for either a specific project, or, cu-

mulatively, for all projects in a corpus. For instance, for any given

Page 47: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Low usage of API methods

API, we may be interested in the API types (classes and inter-faces) that are exercised in projects by any means: extension, im-plementation, variable declaration, all kinds of method calls, andother, less obvious idioms (e.g., instance-of tests). At a more fine-grained level, we may be interested in the exercised members foreach given API type. Hence, qualitative measurements focus ontypes and members that are exercised at all, while quantitativemeasurements rank usage by the number of occurrences of a typeor a member or other weights.

Assuming a representative corpus, further assuming appropri-ate criteria for detecting coverage, we may start with the naiveexpectation that a good API should be covered more or less bythe corpus. Otherwise, the API would contain de-facto unneces-sary types and methods—which is clearly not in the interest of theAPI designer. However, it should not come as a surprise that, inpractice, APIs are not covered very well—certainly not by singlemeaningful projects [3], but—as our results show—not even by asubstantial corpus; see below.

We have actually tried to determine two simple coverage met-rics for all 77 known APIs: i) a percentage-based metrics for thetypes; ii) another percentage-based metrics for all methods. How-ever, we do not feel comfortable presenting a chart of those met-rics for all known APIs here. Unless considerable effort is spenton each API, such a chart may be disputable. The challenge liesin the large number of projects and APIs, the different character-istics of the APIs (e.g., in terms of their use of subtyping), aspectsof cloning, and yet other problems. Some of the issues will bedemonstrated for selected APIs below.

Projects Min 1st Q Median Mean 3rd Q MaxAll built 0.1629 1.954 2.769 4.861 3.746 59.93Reference 1.954 2.891 3.664 3.441 3.95 4.56

Figure 4. Usage of JDOM’s distinct methods.

Let us investigate coverage for specific APIs. As our first tar-get, we pick JDOM—a DOM-like (i.e., tree-based, in-memory)API for XML processing. We know that JDOM is a ‘true library’as opposed to a framework. Regular client code should simplyconstruct objects of the JDOM classes and invoke methods di-rectly. We mention these characteristics because library-like APIsmay be expected to show higher API coverage than framework-

like APIs—if we measure coverage in terms of called methods,as we do here. In this paper, in the case of a call to an instancemethod, we only count the method on the immediate static receivertype as covered. We have checked that the inclusion of super- andsubtypes, as discussed earlier, does not change much the chartsdescribed below.

Initially, we measured cumulative coverage for the methods ofthe JDOM API to be 68.89 %. We decided to study the contribu-tion of the different projects. There are 86 projects with JDOMusage among the built projects of the corpus. Figure 4 shows thepercentage-based coverage metrics for the methods of the JDOMAPI for those JDOM-using projects. The table with maxima andquartiles gives a good indication of the relatively low usage of theJDOM API.

Obviously, 3 projects stand out with their coverage. We foundthat these projects should not be counted towards cumulative cov-erage because these projects contain JDOM clones in source form.That is, it the API calls within the API’s implementation imply ar-tificial coverage for more than half of all JDOM methods. Withoutthose outliers, the cumulative coverage is considerably lower, only24.10 %.

Projects Min 1st Q Median Mean 3rd Q MaxAll built 0.3268 0.3268 0.9804 2.22 2.614 27.12Reference 0.3268 0.3268 1.144 2.369 3.023 11.44

Figure 5. Usage of SAX’ distinct methods.

Let us consider another specific API. We pick SAX—a push-based XML parsing API. The push-based characteristics implythat client code typically extends ‘handler’ classes or implementshandler interfaces with handler methods such as startElementand endElement—to which XML-parsing events are pushed.As a result, one should be prepared to find relatively low APIcoverage—if we measure coverage in terms of called methods.

We measured cumulative coverage for the methods of the SAXAPI to be 50.98 %. This relatively high coverage was suprising.There are 310 projects with SAX usage among the built projects ofthe corpus. Figure 5 shows the percentage-based coverage metricsfor the methods of the SAX API for those SAX-using projects. Wefound that three of the projects with the highest coverage were infact the previously discussed projects with JDOM clones in source

Cumulative coverage of API methods is 24.10% (if we ignore JDOM sources).

Page 48: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Few interfaces implementations for APIs

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100 1000 10000 1e+05 1e+06

Ra

tio

Project Size (in MC)

Projects Min 1st Q Median Mean 3rd Q MaxAll 0.0205 0.4461 0.5551 0.5567 0.6724 1Reference 0.1138 0.4332 0.5026 0.5015 0.5704 0.9969

Fig. 1 shows the usage of known API methods relative to all methods in a project—both in terms of calls. The smaller the ratio (the closerto zero), the lower the contribution of API calls. The quartiles show that in most projects, about each second method call is an API call.As far as instance-method calls are concerned, the figure distinguishes API vs. project-based method calls solely on the grounds of thestatic receiver type of methods.

Figure 1. Ratio of API method calls to all method calls.

Fig. 2 shows the relative frequency of API-interface implementations for all the known APIs (including Core APIs). The picture isdominated by AWT handler types, the interface for iterators, and a few XML-related types.

Figure 2. Tag cloud of implemented API interfaces.

Only 7 APIs are used in more than 10 projects with

“generalization”.

Page 49: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Few subclass derivations for APIs

Fig. 3 shows the relative frequency of API-class extensions. The picture is entirely dominated by Swing’s types for GUIs. Many other

API types are implemented or extended, but only with a marginal frequency.

Figure 3. Tag cloud of overridden API classes.

1

10

100

1000

10000

100000

1e+06

1 10 100 1000 1e+04 1e+05

Nu

mb

er

of

ca

lls

Method

APINon-API

In [1], library reuse is studied at a level of shared objects in the operating systems Sun OS and Mac OS X. One of the observations is

that reuse seems to be low in the sense of Zipf’s law. Thus the most frequent function will be referenced approximately twice as often as

the second most frequent function, which occurs twice as often as the fourth most frequent function, etc. Fig. 4 shows the distribution of

frequency of method calls in the corpus of this paper. Only 0.03% of methods are called more than 10 000 times, while 98.2% of methods

are called less than 100 times. The plot suggests a Zipf-style distribution.

Figure 4. Frequency of calling API vs. non-API methods.

Only 7 APIs are used in more than 10 projects with

“generalization”.

Page 50: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

API usage analysis of an open-source .NET corpus

(June 2011)

• 17 projects from different repositories

• well known, widely used, mature

• amenable to dynamic analysis

• API = .NET namespace

• 401 in total

• 69 after grouping

Page 51: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Corpus of the .NET study

.NET Project Repository LOC Description

3.5 Castle ActiveRecord GitHub 30,303 Object-relational mapper4.0 Castle Core Library GitHub 36,659 Core library for the Castle framework3.5 Castle MonoRail GitHub 58,121 MVC Web framework4.0 Castle Windsor GitHub 50,032 Inversion of control container4.0 Json.NET Codeplex 43,127 JSON framework2.0 log4net Sourceforge 27,799 Logging framework2.0 Lucene.Net Apache.org 158,519 Search engine4.0 Managed Extensibility Framework Codeplex 149,303 Framework for extensible applications and components4.0 Moq GoogleCode 17,430 Mocking library2.0 NAnt Sourceforge 56,529 Build tool3.5 NHibernate Sourceforge 330,374 Object-relational mapper3.5 NUnit Launchpad 85,439 Unit testing framework4.0 Patterns & Practices - Prism Codeplex 146,778 Library to build flexible WPF and Silverlight applications3.5 RhinoMocks GitHub 23,459 Mocking framework2.0 SharpZipLib Sourceforge 25,691 Compression library2.0 Spring.NET GitHub 183,772 Framework for enterprise applications2.0 xUnit.net Codeplex 23,366 Unit testing framework

Table I.NET projects in study’s corpus (versions as of 19 June 2011)

II. METHODOLOGY

A. Research hypothesisPlatforms such as JSE or .NET, leverage programming

language concepts in a systematic manner to make thoseframeworks reusable (say, extensible, instantiatable, or con-figurable). It is challenging to understand the reuse charac-teristics of frameworks and actual reuse in projects at a highlevel of abstraction. Software metrics on top of simple staticand dynamic program analysis are useful to infer essentialhigh-level reuse characteristics.

B. Research questions1) What are the interesting and helpful high-level char-

acteristics of frameworks with regard to their potentialand actual reuse?

2) To what extend can those characteristics be computedwith simple metrics subject to simple static and dy-namic program analysis?

C. Research methodWe applied an explorative approach such that a larger set

of metrics of mainly structural properties was incrementallyscreened until a smaller set of key metrics and derivedclassifiers emerged. We use infographics (such as Figure 1)to visualize metrics, classifiers, and other characteristicsof frameworks and projects that use them. The resultingclaims are subject to validation by domain experts for theframework under study.

D. Study subjectThe subject of study consists of the Microsoft .NET

Framework and a corpus of open-source .NET projects thattarget different versions of the .NET Framework (2.0, 3.5,4.0). Table I collects metadata about the corpus.

.NET (4.0) has 401 namespaces in total, but we groupthese namespaces reasonably based on the tree-like organiza-tion of their compound names. For instance, all namespaces

in the System.Web branch provide web-related functionalityand can be viewed as a single namespace. (In all tables andfigures to come, we signify grouping by “*” as a postfixfor a namespace; see, e.g., Table II.) In this manner, weobtained the more manageable number of 69 namespaces.Such grouping is also common in other presentations of.NET including presentations by Microsoft itself.4

It follows a summary of the requirements for the corpusand the process of its accumulation; more information isavailable from the paper’s website.

One requirement is that the corpus is made up from well-known, widely-used and mature projects. We assume thatsuch projects make good use of .NET.

Another requirement is that dynamic analysis must befeasible for the projects of corpus. This requirement impliespractically that we need projects with good available test-suites. The need for testsuites, in turn, implies practicallythat the corpus is made up from frameworks or librariesas opposed to, e.g., interactive tools. Admittedly, advancedtest-data generation approaches could be used instead [5].

Yet another requirement is that the corpus is made up fromopen-source projects so that our results are more easily re-producible. Also, the instrumentation for static and dynamicanalysis would be problematic for proprietary projects whichusually commit to signed assemblies.

We searched CodePlex, SourceForge, GitHub, and GoogleCode applying the repository-provided ranking for popu-larity (preferably based on downloads). For the topmostapprox. 30 projects of each repository we checked all therequirements, and in this manner, we identified a diverse setof projects as shown in Table I. These projects all use C#as implementation language. (In principle, our approach isprepared to deal with other .NET languages as well—sincethe analysis uses bytecode engineering.)

4http://msdn.microsoft.com/en-us/library/gg145045.aspx

Page 52: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

A Framework Profile of .NET

Ralf Lammel, Rufus Linke, Ekaterina Pek, and Andrei Varanovich

Software Languages Team & ADAPT LabUniversitat Koblenz-Landau, Germany

Abstract—We develop a basic form of framework com-prehension which is based on simple, reuse-related metricsfor the as-implemented design and usage of frameworks. Tothis end, we provide a framework profile which incorporatespotential reuse characteristics (e.g., specializability of typesin a framework) as well as actual reuse characteristics (e.g.,evidence of specialization of framework types in projects). Weapply framework comprehension in an empirical study of theMicrosoft .NET Framework. The approach is helpful in severalcontexts of software reverse and re-engineering.

Keywords-framework, .NET, framework design, frameworkusage, framework profile, reuse, type specialization, late bind-ing, polymorphism, inheritance, program comprehension, soft-ware metrics, dynamic program analysis.

I. INTRODUCTION

Suppose you need to (better) understand the architecture

of a platform such as the Java Standard Edition1, the Mi-

crosoft .NET Framework2, or another composite framework.

There is no silver bullet for such framework comprehension,

but a range of models may be useful in this context. The

present paper describes the notion of framework profilewhich incorporates characteristics of potential and actual

reuse of frameworks. The approach is applied to the Mi-

crosoft .NET Framework and a corpus of .NET projects in

an empirical study.

Framework comprehension supports reverse and re-

engineering activities. In quality assessment of designs [1],

framework profiles help understanding frameworks in a

manner complementary to architectural smells, patterns, or

anti-patterns. Also, one can compare given projects with

the framework profile. More specifically, in framework re-

modularization, framework profiles help summarizing the

status of modularization and motivating refactorings [2]. In

API migration [3], framework profiles help assessing API

replacement options with regard to, for example, different

extensibility characteristics. Finally, framework profiles help

in teaching OO architecture, design, and implementation [4].

A framework profile is illustrated in Figure 1. Reuse-

related properties are depicted for a number of .NET name-

spaces and open-source projects; the names are elided here.

The infographics is derived from the results of static and

dynamic program analysis. The leftmost column displays

a metric for the percentage of specializable types (i.e.,

1http://www.oracle.com/us/javase

2http://www.microsoft.com/net/

• � � � ∗ ∗ �• – – � – – – –

• – – � – – – –

• � �• ∗ ∗ ∗ ∗ � � ∗• ∗ � � � � � ∗ � � � � � ∗ �• ∗ � ∗ ∗ ∗ ∗ � ∗ ∗ � ∗ ∗• ∗ ∗ ∗ ∗ ∗ ∗• � � � � � � � � � � � � � � � � � �• ∗ � ∗

Rows: top-10 .NET namespaces, in terms of number of types.

Middle block of columns: actual reuse by project of the corpus.

Leftmost column: potential reuse in terms of specializability.

Rightmost column: summary of actual reuse.

Figure 1. Infographics for an excerpt of a framework profile for .NET

non-sealed, non-static classes and all interfaces) by using

bullets of increasing size. (This approach will be defined

more precisely later on.) The bulk of the columns classify

actual reuse of namespaces by projects as follows: ‘–’ – the

namespace was not available in the framework version used

by the project; ‘∗’ – the namespace is referenced; ‘�’ – the

namespace is even specialized by the project (i.e., there is

a project type that extends or implements a type of the

namespace); ‘�’ – late binding involves the namespace (i.e.,

there is a project type that acts as runtime receiver type for

a call with a static receiver type from the namespace). The

rightmost column summarizes actual reuse by means of the

dominating classifier, if any, for each row.

Overall, the figure contrasts potential with actual reuse at

a high-level of abstraction.

Summary of contributions3

• We use metrics to analyze potential and actual framework

reuse for some limited forms of reuse. To this end, we

leverage static and dynamic program analysis.

• The metrics also help classifying composite frameworks

with regard to reuse so that namespaces can be associated

with categories that describe reuse characteristics concisely.

• We describe an empirical study for the Microsoft .NET

framework and a corpus of .NET projects such that reuse

characteristics add up to a framework profile for .NET.

Road-map§ II describes our methodology. § III explores reuse-related met-

rics for frameworks. § IV proposes a classification of frameworks.

§V studies actual framework reuse. §VI identifies threats to

validity. §VII discusses related work. §VIII concludes the paper.

3The paper’s web site, http://softlang.uni-koblenz.de/dotnet, provides

support material for the empirical study.

Projects

Nam

espa

ces

Patterns of API usage

Page 53: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

For each class, the number of subclasses in the corpus are shown.

Figure 9. Top 30 .NET classes inherited in the corpus

For each interfaces, the number of implementing classes in the corpus are shown. Note: the full bar counts allimplementations whereas the black part excludes classes that can be reliably classified as being compiler-generated.

Figure 10. Top 30 .NET interfaces implemented in the corpus

Top-30 inherited .NET classes

Page 54: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

For each class, the number of subclasses in the corpus are shown.

Figure 9. Top 30 .NET classes inherited in the corpus

For each interfaces, the number of implementing classes in the corpus are shown. Note: the full bar counts allimplementations whereas the black part excludes classes that can be reliably classified as being compiler-generated.

Figure 10. Top 30 .NET interfaces implemented in the corpus

Top-30 implemented .NET interfaces

Page 55: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Studies on API migration

• T. T. Bartolomei, K. Czarnecki, R. Lämmel, and Tijs van der Storm: Study of an API migration for two XML APIs. SLE 2009.

• T. T. Bartolomei and K. Czarnecki and R. Lämmel: Swing to SWT and Back: Patterns for API Migration by Wrapping. ICSM 2010.

• T. T. Bartolomei and K. Czarnecki and R. Lämmel: Compliance testing for wrapper-based API migration. Submitted.

Page 56: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

• Wrapper-based studies

‣ XML: Re-implement XOM in terms of JDOM

‣ GUI: Re-implement Swing in terms of SWT and v.v.

‣ GUI: Improve the existing SwingWT wrapper

‣ BCE: Re-implement BCEL in terms of ASM

• Wrapper engineering

‣ Validation based on suitable applications

‣ Tracking of engineering process and metrics

‣ Use of design patterns for wrapper design

‣ Use of compliance testing See the papers for details!

Page 57: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Wrapper-based reimplementation of XOM in terms of JDOM

Page 58: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

XOM versus JDOM metrics

Page 59: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

#Types: XOM versus JDOM

3. Introducing XOM and JDOM

We present some basic figures about the two APIs used

in our running case study.

3.1. API package structure

Our case study is concerned with the XOM and JDOM

APIs. These are two prominent XML APIs for the Java

platform which have been independently developed.2

Our

development recovers the distance between the APIs.

API package #Types

nu.xom 35

nu.xom.canonical 2

nu.xom.converters 2

nu.xom.xinclude 9

nu.xom.xslt 2

50

Table 1. #types per XOM package

Table 1 shows XOM’s packages. The nu.xom package is

XOM’s core package (the core API). All the other packages

cover specialized feature themes: XSLT integration, XIn-

clude support, SAX and DOM interoperability, canonical

XML. We have neglected these themes in the case study.

API package #Types

org.jdom 22

org.jdom.adapters 8

org.jdom.filter 4

org.jdom.input 5

org.jdom.output 7

org.jdom.transform 4

50

Table 2. #types per JDOM package

Table 2 shows all JDOM’s packages. JDOM’s core re-

sides in the org.jdom package, but we also employ some

types of the org.jdom.input and org.jdom.output pack-

ages so that we can match XML reading and writing (as

covered by XOM in its core package). The remaining pack-

ages cover again specialized feature themes and correspond

roughly to some of the themes we listed for XOM above

(i.e., XSLT integration, DOM interoperability, content fil-

ters for query functionality).

3.2. Core API features

Table 33

lists all types of XOM’s core API with num-

bers of features — except for helper types for exceptions

and namespaces. This list gives a good idea of the overall

2We use the current versions of those APIs: XOM 1.2 and JDOM 1.1.

3NCLOC means “LOC (i.e., Lines of Code) w/o comment lines”.

nu.xom NCLOC #Features #Declared

Attribute 452 35 27

Builder 784 25 17

Comment 75 24 14

Document 169 34 28

Element 888 59 52

Elements 16 12 3

Node 144 21 12

Nodes 43 19 10

ParentNode 140 28 8

ProcessingInstruction 105 26 18

Text 232 25 15

3048 308 204

Table 3. Metrics on the core XOM classes

complexity of the core API and the relative contribution of

the different XML types.

org.jdom NCLOC #Features #Declared

Attribute 211 33 27

CDATA 50 23 5

Comment 30 18 5

Content 43 15 8

Document 251 47 41

Element 485 87 75

Namespace 81 13 7

ProcessingInstruction 171 26 14

SAXBuilder 381 44 35

Text 88 23 11

XMLOutputter 804 46 38

2595 375 266

Table 4. Metrics on the core JDOM classes

Table 4 lists the key types of the JDOM classes that we

used in re-implementing the XOM classes. What is imme-

diately obvious is that XOM is more “complex” (in terms

of lines of code) but has less features. These observations

align with two known high-level differences between XOM

and JDOM. That is, XOM makes a considerable effort to

guarantee XML well-formedness by means of heavy check-

ing (“more code”), whereas JDOM provides many conve-

nience methods (“more features” or “clutter” according to

the XOM designer4).

3.3. Feature coverage

The case study covers XOM’s API as shown above, but

we leave out two types: NodeFactory which is an exten-

sion point for builders that is not needed for basic scenar-

ios; Serializer which is XOM’s efficient method of serial-

izing XML to files/streams; our tests use the toXML method

instead. Further, XOM’s query feature, which rests on an

XPath processor, has been left out.

4http://www.xom.nu/faq.xhtml

3. Introducing XOM and JDOM

We present some basic figures about the two APIs used

in our running case study.

3.1. API package structure

Our case study is concerned with the XOM and JDOM

APIs. These are two prominent XML APIs for the Java

platform which have been independently developed.2

Our

development recovers the distance between the APIs.

API package #Types

nu.xom 35

nu.xom.canonical 2

nu.xom.converters 2

nu.xom.xinclude 9

nu.xom.xslt 2

50

Table 1. #types per XOM package

Table 1 shows XOM’s packages. The nu.xom package is

XOM’s core package (the core API). All the other packages

cover specialized feature themes: XSLT integration, XIn-

clude support, SAX and DOM interoperability, canonical

XML. We have neglected these themes in the case study.

API package #Types

org.jdom 22

org.jdom.adapters 8

org.jdom.filter 4

org.jdom.input 5

org.jdom.output 7

org.jdom.transform 4

50

Table 2. #types per JDOM package

Table 2 shows all JDOM’s packages. JDOM’s core re-

sides in the org.jdom package, but we also employ some

types of the org.jdom.input and org.jdom.output pack-

ages so that we can match XML reading and writing (as

covered by XOM in its core package). The remaining pack-

ages cover again specialized feature themes and correspond

roughly to some of the themes we listed for XOM above

(i.e., XSLT integration, DOM interoperability, content fil-

ters for query functionality).

3.2. Core API features

Table 33

lists all types of XOM’s core API with num-

bers of features — except for helper types for exceptions

and namespaces. This list gives a good idea of the overall

2We use the current versions of those APIs: XOM 1.2 and JDOM 1.1.

3NCLOC means “LOC (i.e., Lines of Code) w/o comment lines”.

nu.xom NCLOC #Features #Declared

Attribute 452 35 27

Builder 784 25 17

Comment 75 24 14

Document 169 34 28

Element 888 59 52

Elements 16 12 3

Node 144 21 12

Nodes 43 19 10

ParentNode 140 28 8

ProcessingInstruction 105 26 18

Text 232 25 15

3048 308 204

Table 3. Metrics on the core XOM classes

complexity of the core API and the relative contribution of

the different XML types.

org.jdom NCLOC #Features #Declared

Attribute 211 33 27

CDATA 50 23 5

Comment 30 18 5

Content 43 15 8

Document 251 47 41

Element 485 87 75

Namespace 81 13 7

ProcessingInstruction 171 26 14

SAXBuilder 381 44 35

Text 88 23 11

XMLOutputter 804 46 38

2595 375 266

Table 4. Metrics on the core JDOM classes

Table 4 lists the key types of the JDOM classes that we

used in re-implementing the XOM classes. What is imme-

diately obvious is that XOM is more “complex” (in terms

of lines of code) but has less features. These observations

align with two known high-level differences between XOM

and JDOM. That is, XOM makes a considerable effort to

guarantee XML well-formedness by means of heavy check-

ing (“more code”), whereas JDOM provides many conve-

nience methods (“more features” or “clutter” according to

the XOM designer4).

3.3. Feature coverage

The case study covers XOM’s API as shown above, but

we leave out two types: NodeFactory which is an exten-

sion point for builders that is not needed for basic scenar-

ios; Serializer which is XOM’s efficient method of serial-

izing XML to files/streams; our tests use the toXML method

instead. Further, XOM’s query feature, which rests on an

XPath processor, has been left out.

4http://www.xom.nu/faq.xhtml

The wrapper is mainly concerned with the “core” packages.

Page 60: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

#Features: XOM versus JDOM

3. Introducing XOM and JDOM

We present some basic figures about the two APIs used

in our running case study.

3.1. API package structure

Our case study is concerned with the XOM and JDOM

APIs. These are two prominent XML APIs for the Java

platform which have been independently developed.2

Our

development recovers the distance between the APIs.

API package #Types

nu.xom 35

nu.xom.canonical 2

nu.xom.converters 2

nu.xom.xinclude 9

nu.xom.xslt 2

50

Table 1. #types per XOM package

Table 1 shows XOM’s packages. The nu.xom package is

XOM’s core package (the core API). All the other packages

cover specialized feature themes: XSLT integration, XIn-

clude support, SAX and DOM interoperability, canonical

XML. We have neglected these themes in the case study.

API package #Types

org.jdom 22

org.jdom.adapters 8

org.jdom.filter 4

org.jdom.input 5

org.jdom.output 7

org.jdom.transform 4

50

Table 2. #types per JDOM package

Table 2 shows all JDOM’s packages. JDOM’s core re-

sides in the org.jdom package, but we also employ some

types of the org.jdom.input and org.jdom.output pack-

ages so that we can match XML reading and writing (as

covered by XOM in its core package). The remaining pack-

ages cover again specialized feature themes and correspond

roughly to some of the themes we listed for XOM above

(i.e., XSLT integration, DOM interoperability, content fil-

ters for query functionality).

3.2. Core API features

Table 33

lists all types of XOM’s core API with num-

bers of features — except for helper types for exceptions

and namespaces. This list gives a good idea of the overall

2We use the current versions of those APIs: XOM 1.2 and JDOM 1.1.

3NCLOC means “LOC (i.e., Lines of Code) w/o comment lines”.

nu.xom NCLOC #Features #Declared

Attribute 452 35 27

Builder 784 25 17

Comment 75 24 14

Document 169 34 28

Element 888 59 52

Elements 16 12 3

Node 144 21 12

Nodes 43 19 10

ParentNode 140 28 8

ProcessingInstruction 105 26 18

Text 232 25 15

3048 308 204

Table 3. Metrics on the core XOM classes

complexity of the core API and the relative contribution of

the different XML types.

org.jdom NCLOC #Features #Declared

Attribute 211 33 27

CDATA 50 23 5

Comment 30 18 5

Content 43 15 8

Document 251 47 41

Element 485 87 75

Namespace 81 13 7

ProcessingInstruction 171 26 14

SAXBuilder 381 44 35

Text 88 23 11

XMLOutputter 804 46 38

2595 375 266

Table 4. Metrics on the core JDOM classes

Table 4 lists the key types of the JDOM classes that we

used in re-implementing the XOM classes. What is imme-

diately obvious is that XOM is more “complex” (in terms

of lines of code) but has less features. These observations

align with two known high-level differences between XOM

and JDOM. That is, XOM makes a considerable effort to

guarantee XML well-formedness by means of heavy check-

ing (“more code”), whereas JDOM provides many conve-

nience methods (“more features” or “clutter” according to

the XOM designer4).

3.3. Feature coverage

The case study covers XOM’s API as shown above, but

we leave out two types: NodeFactory which is an exten-

sion point for builders that is not needed for basic scenar-

ios; Serializer which is XOM’s efficient method of serial-

izing XML to files/streams; our tests use the toXML method

instead. Further, XOM’s query feature, which rests on an

XPath processor, has been left out.

4http://www.xom.nu/faq.xhtml

3. Introducing XOM and JDOM

We present some basic figures about the two APIs used

in our running case study.

3.1. API package structure

Our case study is concerned with the XOM and JDOM

APIs. These are two prominent XML APIs for the Java

platform which have been independently developed.2

Our

development recovers the distance between the APIs.

API package #Types

nu.xom 35

nu.xom.canonical 2

nu.xom.converters 2

nu.xom.xinclude 9

nu.xom.xslt 2

50

Table 1. #types per XOM package

Table 1 shows XOM’s packages. The nu.xom package is

XOM’s core package (the core API). All the other packages

cover specialized feature themes: XSLT integration, XIn-

clude support, SAX and DOM interoperability, canonical

XML. We have neglected these themes in the case study.

API package #Types

org.jdom 22

org.jdom.adapters 8

org.jdom.filter 4

org.jdom.input 5

org.jdom.output 7

org.jdom.transform 4

50

Table 2. #types per JDOM package

Table 2 shows all JDOM’s packages. JDOM’s core re-

sides in the org.jdom package, but we also employ some

types of the org.jdom.input and org.jdom.output pack-

ages so that we can match XML reading and writing (as

covered by XOM in its core package). The remaining pack-

ages cover again specialized feature themes and correspond

roughly to some of the themes we listed for XOM above

(i.e., XSLT integration, DOM interoperability, content fil-

ters for query functionality).

3.2. Core API features

Table 33

lists all types of XOM’s core API with num-

bers of features — except for helper types for exceptions

and namespaces. This list gives a good idea of the overall

2We use the current versions of those APIs: XOM 1.2 and JDOM 1.1.

3NCLOC means “LOC (i.e., Lines of Code) w/o comment lines”.

nu.xom NCLOC #Features #Declared

Attribute 452 35 27

Builder 784 25 17

Comment 75 24 14

Document 169 34 28

Element 888 59 52

Elements 16 12 3

Node 144 21 12

Nodes 43 19 10

ParentNode 140 28 8

ProcessingInstruction 105 26 18

Text 232 25 15

3048 308 204

Table 3. Metrics on the core XOM classes

complexity of the core API and the relative contribution of

the different XML types.

org.jdom NCLOC #Features #Declared

Attribute 211 33 27

CDATA 50 23 5

Comment 30 18 5

Content 43 15 8

Document 251 47 41

Element 485 87 75

Namespace 81 13 7

ProcessingInstruction 171 26 14

SAXBuilder 381 44 35

Text 88 23 11

XMLOutputter 804 46 38

2595 375 266

Table 4. Metrics on the core JDOM classes

Table 4 lists the key types of the JDOM classes that we

used in re-implementing the XOM classes. What is imme-

diately obvious is that XOM is more “complex” (in terms

of lines of code) but has less features. These observations

align with two known high-level differences between XOM

and JDOM. That is, XOM makes a considerable effort to

guarantee XML well-formedness by means of heavy check-

ing (“more code”), whereas JDOM provides many conve-

nience methods (“more features” or “clutter” according to

the XOM designer4).

3.3. Feature coverage

The case study covers XOM’s API as shown above, but

we leave out two types: NodeFactory which is an exten-

sion point for builders that is not needed for basic scenar-

ios; Serializer which is XOM’s efficient method of serial-

izing XML to files/streams; our tests use the toXML method

instead. Further, XOM’s query feature, which rests on an

XPath processor, has been left out.

4http://www.xom.nu/faq.xhtml

Features = instance methods, static methods, constructors

Page 61: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

CDK -- the application for validation

The SourceForge-hosted application with the most XOM usage exercised by testing

Page 62: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Establish the API correspondence

3.4. XOM’s test suite

The case study uses test-driven development to push for

compliance of the re-implementation of XOM with the ac-

tual XOM API. We use the excellent XOM test suite to this

end. JDOM’s test suite does not have any role in this effort.

TestCase #Tests #Assertions

AttributeTest 38 137

BaseURITest 76 98

BuilderTest 152 364

CommentTest 17 52

DocumentTest 23 98

ElementTest 68 233

LeafNodeTest 3 2

NamespacesTest 53 110

NodesTest 10 33

ParentNodeTest 15 79

ProcessingInstructionTest 19 85

TextTest 18 50

Subtotal: 492 1341

Ignored: 912 1660

Total: 1404 3001

Table 5. Metrics on XOM’s test suite

Table 5 summarizes the structure of XOM’s test suite.

The TestCases are JUnit test classes with the shown number

of test methods. Each test method tends to involve a small

number of tests as evident from the number of assertions.

The list of test classes maps roughly to the core API classes.

What appears as “Ignored” at the bottom of the table are test

classes for non-core API types and features (which we did

not consider in the case study).

XOM also comes with a separate harness of basic bench-

marks to test the speed and memory footprint of XOM pro-

grams. We have not used these benchmarks in any man-

ner, but it would be interesting to systematically compare

XOM’s performance with the one of a wrapper-based re-

implementation.

4. Wrapper-based API migration

The general goal of API migration is to eliminate usage

of a given API (the source API) in given “client code”, and

to make use of another given API (the target API) instead.

Fundamentally, there are two approaches to achieving the

goal. First, rewriting-based migration transforms the client

code — this may be source code or byte code. Second,

wrapper-based migration leaves the client code alone. In-

stead it re-implements the source API (its classes in par-

ticular) in terms of the target API, i.e., instances of the re-

implementing classes wrap instances of the target API.

We have found that the wrapper-based approach is useful

in studying the distance between APIs. That is, the wrap-

per implementation can be seen as an operational specifica-

tion of the API distance that can be both type-checked by

the compiler (which is a sort of completeness check) and

compliance-checked by regression testing (which is a sort

of correctness check).

4.1. API Mapping

We begin a wrapper-based API migration at the design

level, in fact, at the domain level. That is, we perform API

mapping such that for each source type and feature, we aim

to identify a suitable target type and feature. Such mapping

requires domain knowledge; types and features are com-

pared at a level of domain concepts and their operations.

When mapping source types, we distinguish regular vs.irregular types. We say that a type is regular if it corre-

sponds to a single target type (judged at the domain level)

or it is an abstract type (interface or abstract class) without

any associated target types. Otherwise, the type is irregular.

Indeed, some source types may need to be associated with

multiple target types, yet other (concrete) source types may

fail to have a counterpart.

When mapping source features, we distinguish regularvs. irregular features. We say that a feature is regular if the

associated target type(s) offer a feature that is an immedi-

ate counterpart of the source feature at hand (judged at the

level of domain concepts and their operations); otherwise

the feature is irregular.

nu.xom org.jdom

#re

gula

r

featu

res

#ir

regula

r

featu

res

Attribute Attribute 25 3

Builder input.SAXBuilder 15 1

Comment Comment 12 2

Document Document 26 2

Element Element 51 2

Elements java.util.List 2 0

Node 2 10

Nodes java.util.List 8 1

ProcessingInstruction ProcessingInstruction 14 4

Text Text; CDATA 13 1

168 26

Table 6. Metrics on the XOM/JDOM mapping

Table 6 summarizes the API mapping for the XOM/J-

DOM case study. All but one source type, Text, are regular.

There is one abstract (but still regular) source type: Node.

86% of all source features are regular. (We have recorded

this mapping through annotations on the re-implementation

of XOM.)

4.2. Wrapper implementation

We prepare an “empty” re-implementation of the source

API as follows. The source API’s interfaces, if any,

regular = 1:1

Page 63: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Correspondence graph for core APIs: XOM vs. JDOM

Page 64: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Correspondence graph for core APIs: XOM vs. JDOM

Page 65: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Correspondence graph for core APIs: XOM vs. JDOM

Page 66: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Correspondence graph forthe two Comment types

Page 67: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Correspondence graph forthe two Attribute types

Page 68: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Correspondence graph forthe two Attribute types

Page 69: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Correspondence graph forthe two Attribute types

Page 70: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

4 levels of adaptation

1. Straightforward delegation

2. Argument / result processing, conditional delegation

3. Implementation of a missing feature (be it as a “macro”)

4. Re-implementation of a feature due to semantical mismatch

Page 71: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Adaptations per level

nu.xom 1 2 3 4Attribute 16 7 1 2Builder 0 11 0 1Comment 7 2 3 1Document 15 8 2 2Element 22 16 10 2Elements 2 0 0 0Node 0 0 0 2Nodes 6 0 2 0ProcessingInstruction 15 0 2 0Text 4 7 1 1

87 51 21 11

Table 7. Adaptations per level for XOM/JDOM

than a quarter requires some pre-/post-processing; morethan a tenth needs to be composed together; only a hand-ful of features violates the “intention of reuse”, and thesewere easily implemented from scratch.

The shown numbers depend on a judgement call. Thatis, our wrapper-based re-implementation of XOM does notachieve full compliance in the immediate sense of passingthe entire XOM test suite, as is. This judgement call iscarefully backed up below (c.f., §5.2). Obviously, the moreone pushes for full compliance, the more features would bepushed upwards on the level scale. We will give a corre-sponding estimate below.

5. API compliance

When we introduced the scale of adaptation levels, westarted to talk about pre-/post-processing of arguments.Such efforts are needed because of a contractual API mis-match; source and target APIs do not agree on the specificcontracts (semantics) of their feature sets. Here we use theterm contract in a broad sense. Hence, it definitely includespre-/post-conditions, and invariants. We may also include“contracts” regarding performance, reentracy and synchro-nization (parallelism), but such contracts are not furnishedin this paper.

In the present section, we will classify issues of contrac-tual API mismatch: compliance issues for short. We willalso analyse the case study in this respect.

5.1. Generic compliance issues

We begin with generic compliance issues, which aregeneric in the sense that they are meaningful for APIs ofany domain. We give the following definitions for the caseof two APIs α and α� with identical signature. When weapply these definitions to wrapping, then α is the originalimplementation of the source API, whereas α� is the re-implementation (at a given stage of development).

We say that feature f has a PRE+ issue if its preconditionin α� is stronger than the one in α. If we think of α� as

the intended replacement of α, then indeed a PRE+ issueviolates design-by-contract rules.

Surprisingly perhaps, there is also the opposite situation.That is, we say that f has a PRE− issue if its preconditionin α� is weaker than the one in α. A PRE− issue would notviolate design-by-contract rules, but if we face an API thatis supposed to enforce preconditions, then obviously, α� willbe more (too) permissive than α.

In principle, PRE− issues can be addressed by addingextra checked assertions to the too permissive implemen-tation. In contrast, PRE+ issues generally call for a morecomplex implementation.

In a similar manner, we say that f has a POST− issueif its post-condition in α� is weaker than the one in α. APOST− issue violates design-by-contract rules. We havenot found it useful to even consider the opposite situation(which would be called a POST+ issue). Further, we saythat class c has an INVARIANT issue if the invariant of c inα� does not imply the one in α. Again, an INVARIANT issueviolates design-by-contract rules.

Yet another kind of generic compliance issue concernsexceptions. We say that f has a THROWS issue if for thecase that the implementations α and α� agree on whetheror not to throw, the thrown exception is different (in termsof its type or its observable content). This kind of issuehappens when source and target APIs may use designatedexception types and may differ in the use of general-purposeor domain-specific extensions.

Type #Pre

#Pos

t

#Inv

aria

nt

#Thr

ows

Attribute 3 1 - 4Builder - - - -Comment 2 - - -Document 6 - - 4Element 5 - 1 8Elements - - - -Node - - - -Nodes - - - -ParentNode - - - -ProcessingInstruction - - - -Text - - - 2

16 1 1 18

Table 8. #resolved XOM/JDOM issues

Table 8 summarizes the resolved issues for the casestudy. There is about the same number of unresolved issueswhich are listed separately (as well categorized further andjustified) below. There is just one column of combined PREissues; there are (almost) only PRE− issues which accountfor XOM’s rigorous checks for XML well-formedness. Ta-ble 9 shows a few samples of resolved issues.

Page 72: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Compliance statistics

Page 73: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Compliant and non-compliant test cases

Test suite #Test cases#Compliant test cases

#Non-complianttest cases

API Methodcoverage

API

App

697 417 280 156

752 752 0 35

Page 74: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

#Methodsin compliance with different test suites

at different levels

Test suite always sometimes never unused

API

Application

75 77 4 79

35 0 0 200

Page 75: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Generic compliance issues

• PRE+: wrapper precondition too strong (violates DBC)

• PRE-: wrapper precondition too weak (too little checking)

• POST-: wrapper postcondition too weak (violates DBC)

• INVARIANT: wrapper invariant deviates

• THROWS: exception types or values deviate

Type Feature Issue type Comment

Attribute toXML() Post JDOM’s escaping is different from XOM’s

Attribute(String,String) Pre XOM allows colonized names in the first argument whereas

JDOM does not

DocType setSystemID(String) Invariant XOM requires that Public ID has been nulled before SystemID

can be

Element addAttribute(Attribute) Throws XOM throws MultipleParentException if argument is parented

whereas JDOM throws IllegalAddException

Table 9. Samples of resolved XOM/JDOM issues

5.2. Judgement call for compliance

We make a judgement call in deciding on whether or not

to resolve a given issue. The judgement call is based on the

following line of thought.

We start from the observation that full compliance could

be based on two definitions: (i) verification-based (abso-

lute) observational equivalence for original implementation

of the source API and its wrapper-based re-implementation;

(ii) testing-based (relative) observational equivalence rela-

tive to a test suite (or representative client code). We only

commit to the less ambitious option (ii).

The next element of the judgement call is the number

of involved feature implementations. Most importantly, we

would like to generally avoid feature implementations at the

adaptation level 4 (unless they are easily done as it was the

case for those of Table 8. That is, any substantial violation

of the “intention to reuse” the target API runs fundamen-

tally counter the motivation of API migration. Likewise,

we would like avoid complicated or inefficient feature im-

plementations at the adaptation levels 2–3.

The final element of the judgement call is the relevance

of compliance issues for representative client code and the

difficulty of adapting such code if necessary. For instance,

PRE− issues are arguably safer to ignore for existing code

than PRE+issues. Mainstream scenarios should not dis-

turbed by incompliance and necessary adaptations should

be straightforward. This final element of the judgement call

is presumably the most in-operational one.

5.3. API-specific compliance issues

As a means to better justify unresolved issues (in the

view of the tension outlined above), we suggest to comple-

ment the generic categories for compliance issues by API-

specific ones based on the couple of source and target APIs

at hand. These additional categories are to be qualified by

estimations of relevance and difficulty of adapting client

code if necessary. In the case study, we have identified the

following API-specific issues:

BaseURI XOM’s “base URI” handling goes beyond what some

other XML APIs do here; in particular one can obtain the base

URI from any element, whereas JDOM only supports access at

the document level. A full reproduction of XOM’s semantics (also

taking into account the rules of base URI inheritance) on top of

JDOM would account for complex feature implementation — a

feature that is hardly used at all in XML processing code.

Escaping XOM and JDOM systematically differ in their way

of escaping; this also has to do with the various encodings that are

used in XML. For instance, when a document is serialized, text

nodes, attribute values, comments, document types etc. all obey

their own slightly differing escaping conventions. Doing escap-

ing on an already escaped result would account for complex and

expensive feature implementations whose increased compliance

would matter only for extreme borderline scenarios.

Namespaces XOM and JDOM slighly differ on the semantics

of declaring (additional) namespaces (similar in vein to §2.8). For

instance, while both APIs have a method to get a list of namespace

declarations, the APIs slightly disagree on what is included in that

list. This leads to certain hard-to-resolve mismatches, but we con-

tend that regular client code can hardly run into those mismatches

(whereas a test suite may do).

Serialization XML can be serialized in different but possibly

semantically equivalent ways. In particular, XOM and JDOM may

produce serialization results that are equivalent under XML’s in-

foset semantics, but this may still break test cases which perform

string-based comparison. If client code should be affected by such

a mismatch then it will be easy relax the comparison.

DocType XOM and JDOM significantly disagree on DTD han-

dling. For instance, XOM aggressively checks for invalid in-

put of Public Ids, System Ids and internal subsets (DTD well-

formedness). Other differences concern DocType-specific serial-

ization issues (see above). We contend that XML processing sce-

narios that observe, mutate and construct DTD through DOM-like

code are borderline scenarios. We admit that adaptation for full

compliance may be difficult.

Table 10 shows all unresolved (i.e., “don’t fix”) issues

for the case study. On the left, the issues are first listed on

the grounds of the generic categories. On the right, the API-

specific categories are exercised. (The total on the left does

not equal the total on the right because one specific issue

may cover more than one generic issue.)

5.4. Test suite-based compliance

In the case study, based on the described judgement call,

roughly 1/3 of all test methods are unsuccessful. We have

Page 76: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

API-specific compliance issues

• Escaping of characters serialization

• Formatting details of serialization

• Handling of namespace declarations

• Handling of DocType/embedded DTD

• ...

Page 77: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Compliance testing for API migration

T. T. Bartolomei and K. Czarnecki and R. Lämmel: Compliance testing for wrapper-based API migration.

Submitted.

Page 78: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Figure 1: SwingSet2’s Tooltip demo in Swing, SwingWT and evolved SwingWT versions.

The aforementioned method has been implemented in atoolkit called Koloo. The paper’s website1 provides accessto accompanying material such as the wrappers and appli-cations of the study as well as additional data.

Road-map of the paper. §2 presents a motivating exam-ple for wrapper development based on an open-source wrap-per for Java’s Swing API for GUI programming. §3 re-ports on interviews conducted with developers experiencedin API migration, providing requirements for an API migra-tion method. §4 develops the overall notion of compliancetesting for wrapper-based API migration. §5 describes amethod for API migration based on compliance testing. §6describes validation. §7 analyzes related work. §8 concludesthe paper.

2. MOTIVATING EXAMPLEWe will illustrate here compliance issues between original

API and its wrapper-based re-implementation. The aim ofwrapper development is to reduce such issues up to the pointthat the wrapper is ‘good enough’ for use in applications.Original API and replacement API (i.e., the API used inter-nally by the wrapper) may enjoy ‘arbitrary’ di!erences [3,2]; the wrapper must neutralize these di!erences.

Our example concerns two GUI APIs: i) the Swing APIwith the help of the AWT API, which are both part of JDK

(‘the Java platform’), and ii) the SWT API2, which is partof the Eclipse project. There exists the open-source projectSwingWT

3, which is a wrapper-based re-implementation ofSwing in terms of SWT.Using the method of this paper (see §5–§6) and designated

tool support (i.e., the Koloo toolkit), we have developed arevision of SwingWT

4, to which we also refer as ‘evolvedSwingWT’ in the sequel.Fig. 1 illustrates compliance issues of SwingWT versus

Swing; it also illustrates how our systematically evolved wrap-per reduces the compliance issues. In the figure, we exerciseone scenario of the Tooltip demo of SwingSet25, which is aset of Swing demos originally distributed by Sun. The sce-nario is concerned with displaying a specific tooltip, as it istriggered when the user positions the mouse over the cow’smouth. The scenario was chosen because the Tooltip demois clearly concerned with triggering tooltips.

1http://gsd.uwaterloo.ca/issta20122http://eclipse.org/swt3http://swingwt.sf.net4http://swingwt.svn.sf.net/viewvc/swingwt/swingwt/trunk/CHANGELOG?revision=865Source code see Sun JDK folder demo/jfc/SwingSet2/

On the left, the reference behavior is that the cow moos.In the middle, the SwingWT wrapper, prior to our e!orts,is at work. Two problems are noticeable in the image: thebackground color is gray instead of white, and the tooltipshows a di!erent message: ‘cow’ as opposed to ‘Mooooooo’.On the right, the evolved SwingWT wrapper is at work.The background color complies with the original API; thetooltip’s messages complies with the original API as well.We have chosen the example to be visually striking, butwe must emphasize that the method proposed in this pa-per specifically reveals issues that are not easily spotted bylooking at particular screenshots; it also applies to domainsother than GUI programming.

Our method reveals behavioral di!erences in an automatedmanner on the grounds of checked assertions for API con-tracts. Revealed di!erences are also referred to as violations(of compliance). Our method can be exercised with varyinglevels of scrutiny in terms of API contracts and correspond-ing assertions to be imposed on scenario execution. Koloosupports recording and replaying scenarios as well as com-paring results of execution across original API and wrapper.

If we enable API contracts for return values of calls to APImethods exercised by the application, then the violation isdetected that SwingWT does not reproduce the return valuefor the method contains(Point) of class java.awt.Polygon.An investigation reveals that the Tooltip demo uses themethod to map areas of the image to the corresponding mes-sages. The violation is directly linked to the observable factthat a wrong message is displayed in the middle of Fig. 1.

If we also enable API contracts for asynchronous callsfrom the API to the application, then the violationis detected that SwingWT does not execute an asyn-chronous callback to the method contains(int, int) ofclass java.awt.Component. An investigation reveals thatthe Tooltip demo overrides the method to select the appro-priate tooltip to display—specific to the component. Theviolation is also directly linked to the observable fact that awrong message is displayed in the middle of Fig. 1.

If we also enable contracts for state integrity for method-call receivers of API types, then a violation is also flaggedfor the diverging background color, which cannot be revealedthough by looking at return values of method calls becausethe scenario’s design is such that the application does notexercise the (getter of the) background color in any way.It must be noted that additional scrutiny in terms of APIcontracts may also imply noise in reporting violations (see§6) in so far that elimination of these violations does notneed to be reasonably expected by a ‘good enough’ wrapper.

API migration is a hard problem!

• How to spot such differences?

• What contracts to use this end?

• How to avoid regressions?

GUI domain

Page 79: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Figure 6: An overview of an iteration according to the Koloo method.

5.1 Phase: Trace CollectionScenario Design — Trace collection starts with the de-

sign of scenarios that can be automated as tests eventually.A scenario is an execution of the application using the orig-inal API. It is important to design scenarios that cover thedesired uses of the application since they represent the re-quirements for the migration: the wrapper will be consid-ered compliant if it properly replaces the original API inthese scenarios with respect to the verified contracts.Scenarios usually target the applications at hand (as op-

posed to directly using the API). Tests for an application,if available, can be used as scenarios. Such tests usuallycontain assertions related to application logic, which may ormay not also check (indirectly) some aspects of API usage.The interpreter of the method enriches scenarios—both withand without assertions for application logic—with assertionsfor selected API contracts.Traces that are derived from scenarios must not depend

on external actions, and all results should be reproducible—this is common requirement for automated testing. In thecontext of API migration, this requirement may imply extrachallenges though. Consider, for example, API migrationfor GUI APIs: GUI actions would need to be automatedby using a record&replay tool (such as GUITAR [16]) orby manually writing scripts that automate actions upon theGUI. In the example of Fig. 1 we designed a scenario thatpositions the mouse over various areas of the image, andautomated the scenario with AWT’s Robot class.It is the responsibility of the developer to design scenarios

that explore interesting interactions with the API. Depend-ing on the selected contracts, those interactions are more orless strongly verified. As a baseline, let us assume that thesecontract forms are selected: value equality, reference in-tegrity, exception conformance, and callback conformance—we also refer to these contracts as ‘basic contracts’. Withthese contracts, verification will essentially not look into ob-jects (such as receivers of method calls). Scrutiny of verifica-tion is hence increased when state integrity is also selected,but this may also be problematic, as we will discuss in §6.Tracing — The tracer executes the scenarios and col-

lects the API interactions as traces. (Koloo uses the byte-code engineering framework ASM

7 for tracing.) To this end,the tracer is also configured with a description of the typesthat belong to the application and the API; this metadatais also incorporated into the traces. Trace collection should

7http://asm.ow2.org

abstract away details of the underlying application that arenot important from an API compliance testing point of view.(This also helps with automating the scenario and makingit reproducible.) Such abstraction is similar to other ap-proaches for extracting regressions tests from applicationtraces [7, 20, 24, 23].

Type Rules — Developers need to identify applicationand API types. Such identification boils down to ‘type rules’such as regular expressions over their fully qualified names,e.g., java.awt.*, javax.swing.* and javax.accessibility.* inthe case of Swing as the original API. However, the interac-tion between application and API may be mediated by aux-iliary types that are essential for the scenario even thoughthey should not be migrated. For example, APIs often usecollections, iterators, and files. In order to reproduce the sce-nario in subsequent phases, events for such auxiliary typesmust also be incorporated into the traces. The tracer reportson unbound object references in API calls, and developersneed to add type rules for auxiliary types until traces can beconstructed completely. (In the study of §6, only few suchrules were needed: one scenario needed 8 auxiliary types butmost did not need any).8

Additional type rules are needed for rewriting type namessuch that the interpreter can eventually deal with possiblydivergent namespaces for wrappers, when compared to theoriginal API. For instance, the SwingWT wrapper of §2 re-quires the type rule java.awt ! swingwt.awt.

5.2 Phase: Trace ValidationTraces must be validated before they are even attempted

with the emerging wrapper. By validation we mean thatinterpretation of the trace is checked to reproduce the cap-tured behavior. We also include the e!ort for contract selec-tion, including the corresponding aggregation of state that isneeded by contracts but not readily available from the trace,in particular state for checking state integrity; see §4.2.

Interpretation — The interpreter mocks the applicationby reproducing its role in the interactions recorded in thetrace. The interpreter builds a model of all types and meth-ods from the trace so that mock applications types and meth-ods can be provided for trace execution; also, assertions forthe selected contracts (see ‘Settings’ in Fig. 6) can be in-jected. In particular, each incoming event is associated witha mock application type; each mock application method can8Koloo defines Java’s Thread and Runnable types to be aux-iliary types by default, thereby being able to mock multi-threaded applications.

Page 80: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Figure 6: An overview of an iteration according to the Koloo method.

5.1 Phase: Trace CollectionScenario Design — Trace collection starts with the de-

sign of scenarios that can be automated as tests eventually.A scenario is an execution of the application using the orig-inal API. It is important to design scenarios that cover thedesired uses of the application since they represent the re-quirements for the migration: the wrapper will be consid-ered compliant if it properly replaces the original API inthese scenarios with respect to the verified contracts.Scenarios usually target the applications at hand (as op-

posed to directly using the API). Tests for an application,if available, can be used as scenarios. Such tests usuallycontain assertions related to application logic, which may ormay not also check (indirectly) some aspects of API usage.The interpreter of the method enriches scenarios—both withand without assertions for application logic—with assertionsfor selected API contracts.Traces that are derived from scenarios must not depend

on external actions, and all results should be reproducible—this is common requirement for automated testing. In thecontext of API migration, this requirement may imply extrachallenges though. Consider, for example, API migrationfor GUI APIs: GUI actions would need to be automatedby using a record&replay tool (such as GUITAR [16]) orby manually writing scripts that automate actions upon theGUI. In the example of Fig. 1 we designed a scenario thatpositions the mouse over various areas of the image, andautomated the scenario with AWT’s Robot class.It is the responsibility of the developer to design scenarios

that explore interesting interactions with the API. Depend-ing on the selected contracts, those interactions are more orless strongly verified. As a baseline, let us assume that thesecontract forms are selected: value equality, reference in-tegrity, exception conformance, and callback conformance—we also refer to these contracts as ‘basic contracts’. Withthese contracts, verification will essentially not look into ob-jects (such as receivers of method calls). Scrutiny of verifica-tion is hence increased when state integrity is also selected,but this may also be problematic, as we will discuss in §6.Tracing — The tracer executes the scenarios and col-

lects the API interactions as traces. (Koloo uses the byte-code engineering framework ASM

7 for tracing.) To this end,the tracer is also configured with a description of the typesthat belong to the application and the API; this metadatais also incorporated into the traces. Trace collection should

7http://asm.ow2.org

abstract away details of the underlying application that arenot important from an API compliance testing point of view.(This also helps with automating the scenario and makingit reproducible.) Such abstraction is similar to other ap-proaches for extracting regressions tests from applicationtraces [7, 20, 24, 23].

Type Rules — Developers need to identify applicationand API types. Such identification boils down to ‘type rules’such as regular expressions over their fully qualified names,e.g., java.awt.*, javax.swing.* and javax.accessibility.* inthe case of Swing as the original API. However, the interac-tion between application and API may be mediated by aux-iliary types that are essential for the scenario even thoughthey should not be migrated. For example, APIs often usecollections, iterators, and files. In order to reproduce the sce-nario in subsequent phases, events for such auxiliary typesmust also be incorporated into the traces. The tracer reportson unbound object references in API calls, and developersneed to add type rules for auxiliary types until traces can beconstructed completely. (In the study of §6, only few suchrules were needed: one scenario needed 8 auxiliary types butmost did not need any).8

Additional type rules are needed for rewriting type namessuch that the interpreter can eventually deal with possiblydivergent namespaces for wrappers, when compared to theoriginal API. For instance, the SwingWT wrapper of §2 re-quires the type rule java.awt ! swingwt.awt.

5.2 Phase: Trace ValidationTraces must be validated before they are even attempted

with the emerging wrapper. By validation we mean thatinterpretation of the trace is checked to reproduce the cap-tured behavior. We also include the e!ort for contract selec-tion, including the corresponding aggregation of state that isneeded by contracts but not readily available from the trace,in particular state for checking state integrity; see §4.2.

Interpretation — The interpreter mocks the applicationby reproducing its role in the interactions recorded in thetrace. The interpreter builds a model of all types and meth-ods from the trace so that mock applications types and meth-ods can be provided for trace execution; also, assertions forthe selected contracts (see ‘Settings’ in Fig. 6) can be in-jected. In particular, each incoming event is associated witha mock application type; each mock application method can8Koloo defines Java’s Thread and Runnable types to be aux-iliary types by default, thereby being able to mock multi-threaded applications.

Page 81: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

event E ::= (eid, tid,pid, d, T , sig, P [], R)execution id eid ! N

thread id tid ! N

parent id pid ! {eid}direction d ! " | #!event entity T ,P ,R ::= (type, content)entity content content ::= | value | id | (id, entity[])signature sig ! stringtype name type ! stringobject id id ! N

content value value ! string | primitive

Figure 2: Event structure of captured traces.

e1, t0 " :Robot.<init>():0 ...e16, t0 " :ToolTipDemo.<init>( ):1 ...e22, t0 " :JPanel.<init>():2 ...e29, t0 " :ToolTipDemo$Cow.<init>(1):3 ...e32, t0 " :Polygon.<init>():4 ...e62, t0 " 3:ToolTipDemo$Cow.setToolTipText(“Cow”):V ...e66, t0 " 2:JPanel.add(3):3 ...e68, t0 " :JFrame.<init>(“ToolTip Demo”):5 ...e72, t0 " 5:JFrame.getContentPane():6e73, t0 " 6:Container.add(2, “Center”): ...e77, t0 " 5:JFrame.show():V ...e85, t0 " 0:Robot.mouseMove(’342’, ’312’):Ve86, t0 " :Thread.sleep(’3000’):Ve87, t1 #! 6:Container.contains(’155’, ’112’):’true’e88, t1 " :Point.<init>(’155’, ’112’):7e89, t1 " 4:Polygon.contains(7):’true’e90, t1 " 3:ToolTipDemo$Cow.setToolTipText(

“<html><center><font color=blue size=+2>Mooooooo</font></center></html>”)):V

Figure 3: API trace for the example in Fig. 1.

id, which captures the order in which events were detected.A thread id identifies the active thread. The parent id in-dicates the state of the execution stack–event eid occurredin the control flow of its parent event pid. The event signa-ture identifies the target API element, including the statictarget type. From the viewpoint of the application an eventis either outgoing (!), if it originates from an applicationtype, or otherwise incoming ("!). Incoming events expresscallbacks, when the API calls a method implemented by theapplication. A callback is synchronous, if it is the responseto a call in the same thread. It is asynchronous otherwise,i.e., if it is a top-level event of a thread that is controlled bythe API. (In this case, the trace contains only callbacks atthe top-level, meaning that the API created the thread.)The remaining items correspond to the target object, a

sequence of parameters and a return object, all encoded asevent entities. An entity represents a runtime value or objectas a tuple with a string (the runtime type of the entity) anda content item. The content can be (null or void), a value(primitive or string), an object reference id or the referenceto an array (which has an id and points to a sequence ofentities). Exceptions thrown by events are encoded by thereturn value being a reference with the exception type.Fig. 3 shows an excerpt for the actual trace underlying

the motivating cow example of §2. No parent id is includedas it is clear from indentation. This is the trace as capturedwith the original Swing API. (The trace is edited for read-ability. For instance, type names are abbreviated.) Thetrace contains calls to the two di!erent contains methodsthat we encountered during the discussion of behavioral dif-ferences in §2: one for the area check (e89) and one for themissing callback (e87). The last event assigns the specifictooltip’s text using HTML format. As an aside, Swing iscapable of handling HTML, whereas SWT does not supportHTML here. This API di!erence eventually led to an ob-

servable GUI di!erence, which was resolved during wrapperdevelopment such that HTML was stripped o!.

4.2 API ContractsTrace re-execution is subjected to API contracts, which

are checked by assertions. Di!erent forms of API contractscan be selected by the developer. (This is similar to con-trol available in other forms of testing, e.g., the di!erentcontrols for test-data generation in randomized testing [14]).The generation of assertions is also used in other forms oftesting [31]. Technically, Koloo provides a correspondingDSL to select contracts.

Value Equality — when a method call or field accessreturns a value (or a string) it must equal the value observedat capture time.

Reference Integrity — when a method call or field ac-cess returns a reference (i.e., an object) its identity mustequal the identity observed at capture time modulo consis-tent renaming of identities.

Value equality and reference integrity are also applied toarrays and their contents. That is, the array must preservereference integrity and its contents must all preserve valueor reference integrity, depending on the element type.

Exception Conformance — if a method call throws anexception during trace capture, it must also throw an ex-ception of the same type during re-execution. Furthermore,re-execution must not throw any additional exceptions.

Callback Conformance — if a callback was triggeredduring trace capture, it must also be triggered during re-execution. To this end, the event for triggering a callbackmust be verified by associating it with an actual callback.Invocation of callbacks is tracked globally so that they canbe matched with triggering events.

In the case of synchronous callbacks, the association be-tween callbacks and the (preceding) triggering event is un-ambiguous. In the case of asynchronous callbacks, the trig-gering event could date back more arbitrarily. In fact, thereis no guarantee that (multithreaded) callbacks re-executedeterministically in the order of trace capture [8] and devi-ations from the captured order do not imply incorrect be-havior. Assertions are checked hence conservatively suchthat a check for re-execution of the callback is issued at thepreceding event (which is not necessarily the trigger) andthe check does not need to succeed immediately, but it isattempted repeatedly and in parallel to re-execution (in aseparate, auxiliary thread), subject to a large timeout.

Set/get Integrity — subject to name-based patterns,certain pairs of API methods for object modification andobservation are subjected to integrity assertions. For in-stance, a set followed by a get should retrieve the argumentof the set. More precisely, if a method for assumed objectmodification is executed and there is an associated objectobserver, then the observation must retrieve the argumentof the modification. These are the assumed pairs of modi-fiers and observers that can be selected for such contracts;we use X for variable postfixes of method names and T fortype names:

Modifier Observer CommentsetX(... v) getX() getter returns vsetX(boolean v) isX() predicate returns vaddX(... v) getXs() result contains vadd(T v) getTs() result contains vput(... k, ... v) get(k) getter returns v

GUI domain

A trace for a Swing scenario

Page 82: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Figure 6: An overview of an iteration according to the Koloo method.

5.1 Phase: Trace CollectionScenario Design — Trace collection starts with the de-

sign of scenarios that can be automated as tests eventually.A scenario is an execution of the application using the orig-inal API. It is important to design scenarios that cover thedesired uses of the application since they represent the re-quirements for the migration: the wrapper will be consid-ered compliant if it properly replaces the original API inthese scenarios with respect to the verified contracts.Scenarios usually target the applications at hand (as op-

posed to directly using the API). Tests for an application,if available, can be used as scenarios. Such tests usuallycontain assertions related to application logic, which may ormay not also check (indirectly) some aspects of API usage.The interpreter of the method enriches scenarios—both withand without assertions for application logic—with assertionsfor selected API contracts.Traces that are derived from scenarios must not depend

on external actions, and all results should be reproducible—this is common requirement for automated testing. In thecontext of API migration, this requirement may imply extrachallenges though. Consider, for example, API migrationfor GUI APIs: GUI actions would need to be automatedby using a record&replay tool (such as GUITAR [16]) orby manually writing scripts that automate actions upon theGUI. In the example of Fig. 1 we designed a scenario thatpositions the mouse over various areas of the image, andautomated the scenario with AWT’s Robot class.It is the responsibility of the developer to design scenarios

that explore interesting interactions with the API. Depend-ing on the selected contracts, those interactions are more orless strongly verified. As a baseline, let us assume that thesecontract forms are selected: value equality, reference in-tegrity, exception conformance, and callback conformance—we also refer to these contracts as ‘basic contracts’. Withthese contracts, verification will essentially not look into ob-jects (such as receivers of method calls). Scrutiny of verifica-tion is hence increased when state integrity is also selected,but this may also be problematic, as we will discuss in §6.Tracing — The tracer executes the scenarios and col-

lects the API interactions as traces. (Koloo uses the byte-code engineering framework ASM

7 for tracing.) To this end,the tracer is also configured with a description of the typesthat belong to the application and the API; this metadatais also incorporated into the traces. Trace collection should

7http://asm.ow2.org

abstract away details of the underlying application that arenot important from an API compliance testing point of view.(This also helps with automating the scenario and makingit reproducible.) Such abstraction is similar to other ap-proaches for extracting regressions tests from applicationtraces [7, 20, 24, 23].

Type Rules — Developers need to identify applicationand API types. Such identification boils down to ‘type rules’such as regular expressions over their fully qualified names,e.g., java.awt.*, javax.swing.* and javax.accessibility.* inthe case of Swing as the original API. However, the interac-tion between application and API may be mediated by aux-iliary types that are essential for the scenario even thoughthey should not be migrated. For example, APIs often usecollections, iterators, and files. In order to reproduce the sce-nario in subsequent phases, events for such auxiliary typesmust also be incorporated into the traces. The tracer reportson unbound object references in API calls, and developersneed to add type rules for auxiliary types until traces can beconstructed completely. (In the study of §6, only few suchrules were needed: one scenario needed 8 auxiliary types butmost did not need any).8

Additional type rules are needed for rewriting type namessuch that the interpreter can eventually deal with possiblydivergent namespaces for wrappers, when compared to theoriginal API. For instance, the SwingWT wrapper of §2 re-quires the type rule java.awt ! swingwt.awt.

5.2 Phase: Trace ValidationTraces must be validated before they are even attempted

with the emerging wrapper. By validation we mean thatinterpretation of the trace is checked to reproduce the cap-tured behavior. We also include the e!ort for contract selec-tion, including the corresponding aggregation of state that isneeded by contracts but not readily available from the trace,in particular state for checking state integrity; see §4.2.

Interpretation — The interpreter mocks the applicationby reproducing its role in the interactions recorded in thetrace. The interpreter builds a model of all types and meth-ods from the trace so that mock applications types and meth-ods can be provided for trace execution; also, assertions forthe selected contracts (see ‘Settings’ in Fig. 6) can be in-jected. In particular, each incoming event is associated witha mock application type; each mock application method can8Koloo defines Java’s Thread and Runnable types to be aux-iliary types by default, thereby being able to mock multi-threaded applications.

Page 83: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Contracts

• Return values

• Return ids (modulo isomorphism)

• Callbacks

• Set/get contracts

• Apply “tunings”

event E ::= (eid, tid,pid, d, T , sig, P [], R)execution id eid ! N

thread id tid ! N

parent id pid ! {eid}direction d ! " | #!event entity T ,P ,R ::= (type, content)entity content content ::= | value | id | (id, entity[])signature sig ! stringtype name type ! stringobject id id ! N

content value value ! string | primitive

Figure 2: Event structure of captured traces.

e1, t0 " :Robot.<init>():0 ...e16, t0 " :ToolTipDemo.<init>( ):1 ...e22, t0 " :JPanel.<init>():2 ...e29, t0 " :ToolTipDemo$Cow.<init>(1):3 ...e32, t0 " :Polygon.<init>():4 ...e62, t0 " 3:ToolTipDemo$Cow.setToolTipText(“Cow”):V ...e66, t0 " 2:JPanel.add(3):3 ...e68, t0 " :JFrame.<init>(“ToolTip Demo”):5 ...e72, t0 " 5:JFrame.getContentPane():6e73, t0 " 6:Container.add(2, “Center”): ...e77, t0 " 5:JFrame.show():V ...e85, t0 " 0:Robot.mouseMove(’342’, ’312’):Ve86, t0 " :Thread.sleep(’3000’):Ve87, t1 #! 6:Container.contains(’155’, ’112’):’true’e88, t1 " :Point.<init>(’155’, ’112’):7e89, t1 " 4:Polygon.contains(7):’true’e90, t1 " 3:ToolTipDemo$Cow.setToolTipText(

“<html><center><font color=blue size=+2>Mooooooo</font></center></html>”)):V

Figure 3: API trace for the example in Fig. 1.

id, which captures the order in which events were detected.A thread id identifies the active thread. The parent id in-dicates the state of the execution stack–event eid occurredin the control flow of its parent event pid. The event signa-ture identifies the target API element, including the statictarget type. From the viewpoint of the application an eventis either outgoing (!), if it originates from an applicationtype, or otherwise incoming ("!). Incoming events expresscallbacks, when the API calls a method implemented by theapplication. A callback is synchronous, if it is the responseto a call in the same thread. It is asynchronous otherwise,i.e., if it is a top-level event of a thread that is controlled bythe API. (In this case, the trace contains only callbacks atthe top-level, meaning that the API created the thread.)The remaining items correspond to the target object, a

sequence of parameters and a return object, all encoded asevent entities. An entity represents a runtime value or objectas a tuple with a string (the runtime type of the entity) anda content item. The content can be (null or void), a value(primitive or string), an object reference id or the referenceto an array (which has an id and points to a sequence ofentities). Exceptions thrown by events are encoded by thereturn value being a reference with the exception type.Fig. 3 shows an excerpt for the actual trace underlying

the motivating cow example of §2. No parent id is includedas it is clear from indentation. This is the trace as capturedwith the original Swing API. (The trace is edited for read-ability. For instance, type names are abbreviated.) Thetrace contains calls to the two di!erent contains methodsthat we encountered during the discussion of behavioral dif-ferences in §2: one for the area check (e89) and one for themissing callback (e87). The last event assigns the specifictooltip’s text using HTML format. As an aside, Swing iscapable of handling HTML, whereas SWT does not supportHTML here. This API di!erence eventually led to an ob-

servable GUI di!erence, which was resolved during wrapperdevelopment such that HTML was stripped o!.

4.2 API ContractsTrace re-execution is subjected to API contracts, which

are checked by assertions. Di!erent forms of API contractscan be selected by the developer. (This is similar to con-trol available in other forms of testing, e.g., the di!erentcontrols for test-data generation in randomized testing [14]).The generation of assertions is also used in other forms oftesting [31]. Technically, Koloo provides a correspondingDSL to select contracts.

Value Equality — when a method call or field accessreturns a value (or a string) it must equal the value observedat capture time.

Reference Integrity — when a method call or field ac-cess returns a reference (i.e., an object) its identity mustequal the identity observed at capture time modulo consis-tent renaming of identities.

Value equality and reference integrity are also applied toarrays and their contents. That is, the array must preservereference integrity and its contents must all preserve valueor reference integrity, depending on the element type.

Exception Conformance — if a method call throws anexception during trace capture, it must also throw an ex-ception of the same type during re-execution. Furthermore,re-execution must not throw any additional exceptions.

Callback Conformance — if a callback was triggeredduring trace capture, it must also be triggered during re-execution. To this end, the event for triggering a callbackmust be verified by associating it with an actual callback.Invocation of callbacks is tracked globally so that they canbe matched with triggering events.

In the case of synchronous callbacks, the association be-tween callbacks and the (preceding) triggering event is un-ambiguous. In the case of asynchronous callbacks, the trig-gering event could date back more arbitrarily. In fact, thereis no guarantee that (multithreaded) callbacks re-executedeterministically in the order of trace capture [8] and devi-ations from the captured order do not imply incorrect be-havior. Assertions are checked hence conservatively suchthat a check for re-execution of the callback is issued at thepreceding event (which is not necessarily the trigger) andthe check does not need to succeed immediately, but it isattempted repeatedly and in parallel to re-execution (in aseparate, auxiliary thread), subject to a large timeout.

Set/get Integrity — subject to name-based patterns,certain pairs of API methods for object modification andobservation are subjected to integrity assertions. For in-stance, a set followed by a get should retrieve the argumentof the set. More precisely, if a method for assumed objectmodification is executed and there is an associated objectobserver, then the observation must retrieve the argumentof the modification. These are the assumed pairs of modi-fiers and observers that can be selected for such contracts;we use X for variable postfixes of method names and T fortype names:

Modifier Observer CommentsetX(... v) getX() getter returns vsetX(boolean v) isX() predicate returns vaddX(... v) getXs() result contains vadd(T v) getTs() result contains vput(... k, ... v) get(k) getter returns v

Page 84: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Such state integrity is only required for the re-execution ofthe trace if it was valid during trace capture. It may be rea-sonable that APIs do not satisfy ‘laws’ as this. For instance,a setter may handle nulls or perform normalization.State Integrity — if an object on a receiver position

during trace capture admits observations based on namepatterns, then value equality and reference integrity musthold for the corresponding observations during re-execution.Method names for observations are assumed to start with“get”, “has”, “is”, or “count”. Such observation can be con-trolled by opt-in or opt-out for types. Such observation mayalso be controlled to be iterated in depth such that results ofobservers are observed again etc. (We assume ‘shallow’ ob-servation by default.) Finally, such observation may also bedeclared in a ‘point-wise’ manner such that specific receivertypes are associated with specific observations including ar-bitrary closures over the receiver object.

DSL-based declaration of the tuning

org/apache/bcel/classfile/JavaClass.getBytes : BytecodeTuning

Match operation for the tuning

public class BytecodeTuning implements ... {public boolean matches(byte[] expected,

byte[] actual) {return serialize(expected).equals(

serialize(actual));}private static String serialize(byte[] bytecode)StringWriter w = new StringWriter();new ClassReader(bytecode).accept(new TraceClassVisitor(new PrintWriter(w)), 0);

return w.toString();}

}

Figure 4: BytecodeTuning implementation.

4.3 Assertion TuningThe major source of control in compliance testing is selec-

tion of contracts for which assertions are to be checked, as wediscussed above. A further source of control is tuning of thesemantics of assertions, as we will discuss now. For instance,value equality can be relaxed. Also, non-deterministic, non-repeatable behavior can be accounted for. Technically, Koloo’sDSL, which we mentioned above in the context of API con-tracts, also serves the declaration of tunings: declarationsare of the form Package/Type.Method : NameOfTuning.Tunings can be applied to results of a certain reference

type or in a more point-wise manner to results of specificmethods. Conceptually, a tuning consists of a match opera-tion for expected and actual result (i.e., the results capturedoriginally and during re-execution).Fig. 4 demonstrates an example in more detail. The ‘Byte-

codeTuning’ occurred in a wrapper study in the domain ofbyte-code engineering (see §6). The example is concernedwith the comparison of the result returned by the centralgetBytes method of the API, which is supposed to returnthe byte-code of a given Java class; see the upper part of thefigure for the DSL phrase that applies a tuning to getBytes.

org/apache/bcel/generic/InstructionHandle.getTargeters : ArrayIsSet

Compliance of byte-code engineering APIs: a specificmethod’s result of an array type is declared to have setsemantics.

java/awt/Component.getHeight : Percentage 10

Compliance of GUI APIs: a percentage-based divergencebetween expected and actual height of a component isadmitted.

java/awt/Point.x : Margin 15

Compliance of GUI APIs: a margin-based divergence be-tween expected and actual value of a point’s coordinateis admitted.

javax/swing/JFrame.pack : SkipPackCallbacks

Compliance of GUI APIs: an API-specific tuning thatturns o! checking of certain callbacks (not listed here)that are triggered by calling pack().

Figure 5: Diverse tuning examples.

The challenge is that the original API and the replacementAPI may use di!erent orders for the constant pool that ispart of the returned byte code sequence. When serializingthe bytecode into strings, then constants are essentially in-lined and hence this di!erence is neutralized. The replace-ment API readily provides a visitor for serialization, whichis accordingly used by the match operation in the figure.

5. METHODWe propose a method for wrapper development, which is

motivated by our prior experiences with API migration [3, 2]and the results of the interviews with experienced developers;see §3. Accordingly, the input of the corresponding process isan application (or several applications) subject to wrapper-based migration with regard to a specific API couple; theoutput is a wrapper, which is supposed to be ‘good enough’.Also, the method is iterative to improve the wrapper in astepwise manner. Overall, the method provides a level ofautomated testing that improves state of the art in APImigration; see, again, §3.

Each iteration is composed of three phases (see Fig. 6)inspired by capture and replay approaches [20, 24, 12]. De-velopers first extract a dynamic trace of the interaction be-tween the application and the API; see §5.1. (Of course, aniteration could also deal with multiple traces at once.) Then,an interpreter is used to re-execute the trace so that it canbe validated in the sense that the trace mocks the behav-ior of the original application—still using the original API;see §5.2. In this phase, settings are aggregated to decide onAPI contracts to be asserted. Ultimately, the interpreter isused with the wrapper under development and activities ofwrapper evolution and configuration alternate; see §5.3. Inthis phase, the final decisions on applicable API contractsand corresponding tunings are made. The violations of con-tracts guide the evolution of the wrapper. The method issupported by the Koloo toolkit.

Tuning

Bytecode engineeringdomain

Page 85: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Related work on API migration

Page 86: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Related work -- selection

• [HenkelD05] J. Henkel and A. Diwan: CatchUp!: capturing and replaying refactorings to support API evolution. ICSE 2005.

• [BalabanTF05] I. Balaban, F. Tip, and R. Fuhrer: Refactoring Support for Class Library Migration. OOPSLA 2005.

• [PadioleauLHM08] Y. Padioleau, J.L. Lawall, René Rydhof Hansen, Gilles Muller: Documenting and automating collateral evolutions in linux device drivers. EuroSys 2008.

• [NitaN10] M. Nita and D. Notkin: Using Twinning to Adapt Programs to Alternative APIs. ICSE 2010.

Page 87: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

[HenkelD05] Record and reply library refactorings

• “Representative” of much work on API upgrade

• Orthogonal aspects:‣ Inference of refactoring‣ Wrapper generation

• General API migration ≠ refactoring

Page 88: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

[BalabanTF05] I. Balaban, F. Tip, and R. Fuhrer: Refactoring Support for Class Library Migration. OOPSLA 2005.

• 1:1 type mapping‣ Type constraints for applicability‣ Escape analysis for synchronization‣ No mapping for exception types‣ No mapping for application subtypes

• Method mapping‣ Unidirectional mapping‣ Single call scope for input

(1) new Vector(), unsynchronized ! new ArrayList()(2) new Vector(), synchronized ! Collections.synchronizedList(new ArrayList())(3) int Vector:receiver.size() ! int receiver.size()(4) Object Vector:receiver.firstElement() ! Object receiver.get(0)(5) Object Vector:receiver.setElementAt(Object: value, int: index) ! Object receiver.set(index, value)(6) void Vector:receiver.copyInto(Object: array) ! void Util.copyInto(receiver, array)(7) Enumeration Vector:receiver.elements() ! Iterator receiver.iterator()(8) new Hashtable(), unsynchronized ! new HashMap()(9) new Hashtable(), synchronized ! Collections.synchronizedMap(new HashMap())(10) Object Hashtable:receiver.put(Object: key, Object: value) ! Object receiver.put(key, value)(11) Enumeration Hashtable:receiver.elements() ! Iterator (Collection receiver.values()).iterator()

Figure 2: Specification used for migrating the example program of Figure 1.

heavily used legacy classes from the standard collection li-braries. Specifically, we considered migration from Vectorto ArrayList, Hashtable to HashMap, and Enumeration toIterator. In these benchmarks, our tool could automati-cally migrate an average of 90.4% of the declarations thatrefer to legacy classes, 96.6% of the call sites, and 92.0%of the allocation sites. Moreover, for the migrated alloca-tion sites, only a small fraction needed to be surroundedwith synchronization wrappers. Code inspection revealedthat the remaining occurrences of legacy classes could notbe migrated due to interaction with external libraries suchas Swing.

The remainder of this paper is organized as follows. Sec-tion 2 presents a motivating example that illustrates ourapproach. Section 3 presents the model of type constraintsthat will be used to determine where migrations are allowed,and Section 4 describes how type constraints are solved. Sec-tion 5 discusses the evaluation of our approach on a set ofmoderate-sized benchmarks. Related work is discussed inSection 6. Finally, conclusions and directions for future workare presented in Section 7.

2. MOTIVATING EXAMPLEFigure 1(a) shows an example program that implements

a simple “wizard” using a number of components from theSwing class library. The program displays a list of values in adrop-down menu, and prompts the user to select one or moreof these. Then, the selected values are written to a databasevia a call to the method writeToDatabase(Object[]), thedefinition of which is omitted. Our goal is to transformthis program so that it uses ArrayList instead of the legacyclass Vector, HashMap instead of Hashtable, and Iteratorinstead of Enumeration. Figure 1(b) shows the desired refac-tored program.ArrayList was introduced in the standard libraries to re-

place Vector, and is considered preferable because its inter-face is minimal and matches the functionality of the Listinterface. More importantly, it allows for unsynchronizedaccess to a list’s elements whereas all of Vector’s meth-ods are synchronized3 which results in unnecessary over-head when Vectors are used by only one thread. Similarly,HashMap was introduced to replace Hashtable and allowsfor unsynchronized access to a hash-table’s elements. More-over, HashMap allows for null values and it does not containany of Hashtable’s methods that produce Enumerations.The Iterator interface that replaces Enumeration features

3As noted in e.g., [6, 1, 27], synchronization operations inJava programs may incur significant performance overhead.

public class Util {public static void copyInto(ArrayList v,

Object[] target) {if (target == null)

throw new NullPointerException();for (int i = v.size() - 1; i >= 0; i--)

target[i] = v.get(i);}

}

Figure 3: Auxiliary class that contains a methodused for migrating Vector.copyInto().

shorter method names and supports an additional remove()method for the safe removal of elements during iteration.

Figure 2 shows a migration specification, as would be writ-ten by a user, for migrating the example program. This spec-ification consists of a set of rewrite rules that express howallocation sites and method calls on legacy classes4 Vector,Hashtable and Enumeration can be rewritten. For example,rule (3) states that migrating calls to method Vector.size()requires no modification, since ArrayList defines a syn-tactically and semantically identical method. If methodsin the legacy class are not supported by the replacementclass, rewriting method calls becomes more involved. Forexample, Vector supports a method firstElement() notdefined in ArrayList. Rule (4) states that a call receiver.firstElement(), where receiver is an expression of typeVector, should be transformed into receiver.get(0). Incases where it is not possible to express the e!ect of amethod call on the legacy class in terms of calls to methodson the replacement class, calls may be mapped to methodsin user-defined classes. For example, Vector has a methodcopyInto() that copies the contents of a Vector into an ar-ray. Since ArrayList does not provide this functional-ity, rule (6) transforms receiver.copyInto(array) intoa call to method Util.copyInto(receiver, array) of theUtil class shown in Figure 3. This strategy can also beused to migrate between methods that throw di!erent typesof exceptions. Specifically, a user-defined class method canbe used to wrap a method in a replacement class in ordertranslate between exception types. Finally, when an opera-tion in a legacy class that is not supported by a replacementclass cannot be modeled using an auxiliary class, migrationmay become impossible.

Rules (1) and (2) are both concerned with rewritingallocation sites of the form new Vector(). The formerapplies in cases where thread-safety need not be pre-

4In this discussion we use the term class to refer to Javaclasses as well as to interfaces.

267

Collections domain

Page 89: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Type constraints for applicability

• Consider a constructor call to JComboBox:

Vector v = ...;JComboBox b = new JComboBox(v);

• Here, JComboBox belongs to the Swing library.

• There, is no constructor that takes an ArrayList.

• Hence, Vector cannot be migrated to ArrayList.

In principle, one may use coercions in these contexts.

Collections domain

Page 90: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

On the issue of exception types

XML domain

• nu.xom:

‣ 28 distinguished exception types

‣ Uses standard exception types

• org.jdom:

‣ 8 distinguished exception types

‣ Uses standard exception types

Without exception type mapping, applications do not even compile.

Page 91: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

On the issue of unidirectional mappings

(1) new Vector(), unsynchronized ! new ArrayList()(2) new Vector(), synchronized ! Collections.synchronizedList(new ArrayList())(3) int Vector:receiver.size() ! int receiver.size()(4) Object Vector:receiver.firstElement() ! Object receiver.get(0)(5) Object Vector:receiver.setElementAt(Object: value, int: index) ! Object receiver.set(index, value)(6) void Vector:receiver.copyInto(Object: array) ! void Util.copyInto(receiver, array)(7) Enumeration Vector:receiver.elements() ! Iterator receiver.iterator()(8) new Hashtable(), unsynchronized ! new HashMap()(9) new Hashtable(), synchronized ! Collections.synchronizedMap(new HashMap())(10) Object Hashtable:receiver.put(Object: key, Object: value) ! Object receiver.put(key, value)(11) Enumeration Hashtable:receiver.elements() ! Iterator (Collection receiver.values()).iterator()

Figure 2: Specification used for migrating the example program of Figure 1.

heavily used legacy classes from the standard collection li-braries. Specifically, we considered migration from Vectorto ArrayList, Hashtable to HashMap, and Enumeration toIterator. In these benchmarks, our tool could automati-cally migrate an average of 90.4% of the declarations thatrefer to legacy classes, 96.6% of the call sites, and 92.0%of the allocation sites. Moreover, for the migrated alloca-tion sites, only a small fraction needed to be surroundedwith synchronization wrappers. Code inspection revealedthat the remaining occurrences of legacy classes could notbe migrated due to interaction with external libraries suchas Swing.

The remainder of this paper is organized as follows. Sec-tion 2 presents a motivating example that illustrates ourapproach. Section 3 presents the model of type constraintsthat will be used to determine where migrations are allowed,and Section 4 describes how type constraints are solved. Sec-tion 5 discusses the evaluation of our approach on a set ofmoderate-sized benchmarks. Related work is discussed inSection 6. Finally, conclusions and directions for future workare presented in Section 7.

2. MOTIVATING EXAMPLEFigure 1(a) shows an example program that implements

a simple “wizard” using a number of components from theSwing class library. The program displays a list of values in adrop-down menu, and prompts the user to select one or moreof these. Then, the selected values are written to a databasevia a call to the method writeToDatabase(Object[]), thedefinition of which is omitted. Our goal is to transformthis program so that it uses ArrayList instead of the legacyclass Vector, HashMap instead of Hashtable, and Iteratorinstead of Enumeration. Figure 1(b) shows the desired refac-tored program.ArrayList was introduced in the standard libraries to re-

place Vector, and is considered preferable because its inter-face is minimal and matches the functionality of the Listinterface. More importantly, it allows for unsynchronizedaccess to a list’s elements whereas all of Vector’s meth-ods are synchronized3 which results in unnecessary over-head when Vectors are used by only one thread. Similarly,HashMap was introduced to replace Hashtable and allowsfor unsynchronized access to a hash-table’s elements. More-over, HashMap allows for null values and it does not containany of Hashtable’s methods that produce Enumerations.The Iterator interface that replaces Enumeration features

3As noted in e.g., [6, 1, 27], synchronization operations inJava programs may incur significant performance overhead.

public class Util {public static void copyInto(ArrayList v,

Object[] target) {if (target == null)

throw new NullPointerException();for (int i = v.size() - 1; i >= 0; i--)

target[i] = v.get(i);}

}

Figure 3: Auxiliary class that contains a methodused for migrating Vector.copyInto().

shorter method names and supports an additional remove()method for the safe removal of elements during iteration.

Figure 2 shows a migration specification, as would be writ-ten by a user, for migrating the example program. This spec-ification consists of a set of rewrite rules that express howallocation sites and method calls on legacy classes4 Vector,Hashtable and Enumeration can be rewritten. For example,rule (3) states that migrating calls to method Vector.size()requires no modification, since ArrayList defines a syn-tactically and semantically identical method. If methodsin the legacy class are not supported by the replacementclass, rewriting method calls becomes more involved. Forexample, Vector supports a method firstElement() notdefined in ArrayList. Rule (4) states that a call receiver.firstElement(), where receiver is an expression of typeVector, should be transformed into receiver.get(0). Incases where it is not possible to express the e!ect of amethod call on the legacy class in terms of calls to methodson the replacement class, calls may be mapped to methodsin user-defined classes. For example, Vector has a methodcopyInto() that copies the contents of a Vector into an ar-ray. Since ArrayList does not provide this functional-ity, rule (6) transforms receiver.copyInto(array) intoa call to method Util.copyInto(receiver, array) of theUtil class shown in Figure 3. This strategy can also beused to migrate between methods that throw di!erent typesof exceptions. Specifically, a user-defined class method canbe used to wrap a method in a replacement class in ordertranslate between exception types. Finally, when an opera-tion in a legacy class that is not supported by a replacementclass cannot be modeled using an auxiliary class, migrationmay become impossible.

Rules (1) and (2) are both concerned with rewritingallocation sites of the form new Vector(). The formerapplies in cases where thread-safety need not be pre-

4In this discussion we use the term class to refer to Javaclasses as well as to interfaces.

267

(1) new Vector(), unsynchronized ! new ArrayList()(2) new Vector(), synchronized ! Collections.synchronizedList(new ArrayList())(3) int Vector:receiver.size() ! int receiver.size()(4) Object Vector:receiver.firstElement() ! Object receiver.get(0)(5) Object Vector:receiver.setElementAt(Object: value, int: index) ! Object receiver.set(index, value)(6) void Vector:receiver.copyInto(Object: array) ! void Util.copyInto(receiver, array)(7) Enumeration Vector:receiver.elements() ! Iterator receiver.iterator()(8) new Hashtable(), unsynchronized ! new HashMap()(9) new Hashtable(), synchronized ! Collections.synchronizedMap(new HashMap())(10) Object Hashtable:receiver.put(Object: key, Object: value) ! Object receiver.put(key, value)(11) Enumeration Hashtable:receiver.elements() ! Iterator (Collection receiver.values()).iterator()

Figure 2: Specification used for migrating the example program of Figure 1.

heavily used legacy classes from the standard collection li-braries. Specifically, we considered migration from Vectorto ArrayList, Hashtable to HashMap, and Enumeration toIterator. In these benchmarks, our tool could automati-cally migrate an average of 90.4% of the declarations thatrefer to legacy classes, 96.6% of the call sites, and 92.0%of the allocation sites. Moreover, for the migrated alloca-tion sites, only a small fraction needed to be surroundedwith synchronization wrappers. Code inspection revealedthat the remaining occurrences of legacy classes could notbe migrated due to interaction with external libraries suchas Swing.

The remainder of this paper is organized as follows. Sec-tion 2 presents a motivating example that illustrates ourapproach. Section 3 presents the model of type constraintsthat will be used to determine where migrations are allowed,and Section 4 describes how type constraints are solved. Sec-tion 5 discusses the evaluation of our approach on a set ofmoderate-sized benchmarks. Related work is discussed inSection 6. Finally, conclusions and directions for future workare presented in Section 7.

2. MOTIVATING EXAMPLEFigure 1(a) shows an example program that implements

a simple “wizard” using a number of components from theSwing class library. The program displays a list of values in adrop-down menu, and prompts the user to select one or moreof these. Then, the selected values are written to a databasevia a call to the method writeToDatabase(Object[]), thedefinition of which is omitted. Our goal is to transformthis program so that it uses ArrayList instead of the legacyclass Vector, HashMap instead of Hashtable, and Iteratorinstead of Enumeration. Figure 1(b) shows the desired refac-tored program.ArrayList was introduced in the standard libraries to re-

place Vector, and is considered preferable because its inter-face is minimal and matches the functionality of the Listinterface. More importantly, it allows for unsynchronizedaccess to a list’s elements whereas all of Vector’s meth-ods are synchronized3 which results in unnecessary over-head when Vectors are used by only one thread. Similarly,HashMap was introduced to replace Hashtable and allowsfor unsynchronized access to a hash-table’s elements. More-over, HashMap allows for null values and it does not containany of Hashtable’s methods that produce Enumerations.The Iterator interface that replaces Enumeration features

3As noted in e.g., [6, 1, 27], synchronization operations inJava programs may incur significant performance overhead.

public class Util {public static void copyInto(ArrayList v,

Object[] target) {if (target == null)

throw new NullPointerException();for (int i = v.size() - 1; i >= 0; i--)

target[i] = v.get(i);}

}

Figure 3: Auxiliary class that contains a methodused for migrating Vector.copyInto().

shorter method names and supports an additional remove()method for the safe removal of elements during iteration.

Figure 2 shows a migration specification, as would be writ-ten by a user, for migrating the example program. This spec-ification consists of a set of rewrite rules that express howallocation sites and method calls on legacy classes4 Vector,Hashtable and Enumeration can be rewritten. For example,rule (3) states that migrating calls to method Vector.size()requires no modification, since ArrayList defines a syn-tactically and semantically identical method. If methodsin the legacy class are not supported by the replacementclass, rewriting method calls becomes more involved. Forexample, Vector supports a method firstElement() notdefined in ArrayList. Rule (4) states that a call receiver.firstElement(), where receiver is an expression of typeVector, should be transformed into receiver.get(0). Incases where it is not possible to express the e!ect of amethod call on the legacy class in terms of calls to methodson the replacement class, calls may be mapped to methodsin user-defined classes. For example, Vector has a methodcopyInto() that copies the contents of a Vector into an ar-ray. Since ArrayList does not provide this functional-ity, rule (6) transforms receiver.copyInto(array) intoa call to method Util.copyInto(receiver, array) of theUtil class shown in Figure 3. This strategy can also beused to migrate between methods that throw di!erent typesof exceptions. Specifically, a user-defined class method canbe used to wrap a method in a replacement class in ordertranslate between exception types. Finally, when an opera-tion in a legacy class that is not supported by a replacementclass cannot be modeled using an auxiliary class, migrationmay become impossible.

Rules (1) and (2) are both concerned with rewritingallocation sites of the form new Vector(). The formerapplies in cases where thread-safety need not be pre-

4In this discussion we use the term class to refer to Javaclasses as well as to interfaces.

267

What would it take to execute the rule backwards?

Collections domain

Page 92: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

On the issue of single call scope

public static Document makeDocument(List<Person> contacts) { Document doc = new Document(); Element root = new Element("contacts"); document.addContent(root); for (Person p: contacts) { Element px = new Element("person"); Element namex = new Element("name"); namex = namex.setText(p.getName()); px = px.addContent(namex); root = root.addContent(px); }

XML domain

Suppose that this code needs to rewritten to factory-based style. A flow analysis must be employed to

determine doc as the factory.

Page 93: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

On the issue of single call scope

public static Document loadDocument(String filename) {

Document doc = null;

try {

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

DocumentBuilder builder = factory.newDocumentBuilder();

doc = builder.parse(new File(filename));

} catch (SAXException e) {

} catch (ParserConfigurationException e) {

} catch (IOException e) { ... }

return doc;

}

XML domain

The shown code is “typical” idiomatic DOM legacy, which needs to be

folded into just a few method calls, when using other XML APIs.

Page 94: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

[PadioleauLHM08] Y. Padioleau, J.L. Lawall, René Rydhof Hansen, Gilles Muller: Documenting and automating

collateral evolutions in linux device drivers. EuroSys 2008.

1 static int usb_storage_proc_info (

2 char *buffer, char **start, off_t offset,

3 int length, int hostno, int inout)

4 {

5 struct us_data *us;

6 struct Scsi Host *hostptr;

7

8 hostptr = scsi host hn get(hostno);

9 if (!hostptr) { return -ESRCH; }

10

11 us = (struct us_data*)hostptr->hostdata[0];

12 if (!us) {

13 scsi host put(hostptr);

14 return -ESRCH;

15 }

16

17 SPRINTF(" Vendor: %s\n", us->vendor);

18 scsi host put(hostptr);

19 return length;

20 }

(a) Simplified Linux 2.5.70 code

1 static int usb_storage_proc_info (struct Scsi Host *hostptr,

2 char *buffer, char **start, off_t offset,

3 int length, int hostno, int inout)

4 {

5 struct us_data *us;

6

7

8

9

10

11 us = (struct us_data*)hostptr->hostdata[0];

12 if (!us) {

13

14 return -ESRCH;

15 }

16

17 SPRINTF(" Vendor: %s\n", us->vendor);

18

19 return length;

20 }

(b) Transformed code

Figure 1: An example of collateral evolution, based on code in drivers/usb/storage/scsiglue.c

writes a semantic patch that describes the entailed collat-

eral evolutions. As presented in the next section, Coccinelle

uses a C-like notation for semantic patches, which means

that the developer can often start with a typical driver, per-

form the transformation by hand, apply diff to the old and

new versions to create a standard patch and then edit this

patch to abstract away from inessential details. To test his

intuitions and bring the affected drivers up to date, the de-

veloper applies the semantic patch across the kernel source

tree. If this initial semantic patch does not apply completely

to all of the affected drivers, e.g., due to unanticipated vari-

ations in coding style, then he must iteratively refine and

test it. To aid in this process, Coccinelle provides an in-

teractive mode based on Emacs’ ediff where the developer

can approve each change before it is applied to the source

code, and gives information about cases where the semantic

patch only partially matches the source code. Eventually,

the library developer is confident that the semantic patch

addresses all relevant cases. He can then apply it one final

time to the drivers in the Linux kernel source tree, with the

assurance that changes will be made uniformly in all rele-

vant files, and then publish it for use by driver maintainers

in a repository analogous to the current repository for stan-

dard Linux patches,1

providing a complete formal record of

the acquired expertise.

Coccinelle and the driver maintainer or motivated user.When faced with a driver that is not compatible with a re-

cent version of the Linux kernel, a programmer can search

the repository of semantic patches for those corresponding to

collateral evolutions between the Linux version supported by

his driver and the desired one, and apply them to his driver.

This raises the issue, however, of whether a semantic patch

that was created without knowledge of his code will apply

correctly. Due to the copy-paste model of development and

the constraints on the use of the API, the semantic patch

is likely to be compatible with the driver code. Still, the

programmer can first read the semantic patch, which iden-

tifies in terms of C code fragments the complete set of code

patterns that are affected by the collateral evolution, and

compare them to the structure of his driver. Next, he can

1http://git.kernel.org/

apply the semantic patch using the Coccinelle interactive

mode. If Coccinelle indicates that the semantic patch only

partially matches the driver code, he can refine the semantic

patch, following the same iterative process as the library de-

veloper, or simply modify the affected code by hand, using

the semantic patch as a formal guideline. Finally, he may

contribute a refined semantic patch to the repository.

Summary. The benefits of Coccinelle for both a library de-

veloper and a driver maintainer are the closeness of the se-

mantic patch language to C code, the automatic transforma-

tion tool that makes it possible to easily and reliably apply

transformations to driver code, and the feedback provided

by the interactive usage mode and the partial matches. We

describe and assess these features in the rest of this paper.

3. SMPL IN A NUTSHELL

Coccinelle provides the language SmPL (Semantic Patch

Language) for writing semantic patches. The design of SmPL

is guided by the goal of providing a declarative, WYSIWYG

approach that is close to the C language and builds on the

existing patch notation. To motivate the features of SmPL,

we first consider a moderately complex collateral evolution

that raises many typical issues. We then present SmPL in

terms of this example.

The proc_info collateral evolution. The proc info collat-

eral evolution concerns the use of the SCSI API functions

scsi_host_hn_get and scsi_host_put, which access and

release, respectively, a structure of type Scsi_Host, and ad-

ditionally manage a reference count. In Linux 2.5.71, it was

decided that, due to the criticality of the reference count,

driver code could not be trusted to use these functions cor-

rectly and they were removed from the SCSI API [13]. This

evolution had collateral effects on the “proc info” callback

functions2

defined by SCSI drivers, which call these API

functions. To compensate for the removal of scsi_host_-

hn_get and scsi_host_put, the SCSI library began in Linux

2.5.71 to pass a Scsi_Host-typed structure as an argument

2A proc info callback function makes accessible at the user

level various information about the device.

249

• Interpret “semantic patches” as program transformations

‣ C programs as processed as control-flow grammar

‣ Semantic patch is interpreted as CTL formula

‣ Code isomorphisms and “...” enabled

Linux drivers

Page 95: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

[PadioleauLHM08] Semantic patches

1 @ rule1 @

2 struct SHT ops;3 identifier proc info func;4 @@

5 ops.proc_info = proc info func;6

7 @ rule2 @

8 identifier buffer, start, offset, length, hostno, inout;9 identifier hostptr, rule1.proc info func;10 @@

11 proc info func (

12 + struct Scsi_Host *hostptr,13 char *buffer, char **start, off_t offset,14 int length, int hostno, int inout) {

15 ...

16 - struct Scsi_Host *hostptr;17 ...

18 - hostptr = scsi_host_hn_get(hostno);19 ...

20 - if (!hostptr) { ... return ...; }

21 ...

22 - scsi_host_put(hostptr);23 ...

24 }

Figure 2: A semantic patch for performing theproc info collateral evolutions. Metavariables areshown in italics.

to these callback functions. Collateral evolutions were then

needed in the proc info functions to remove the calls to

scsi_host_hn_get and scsi_host_put, and to add the new

argument.

Figure 1 shows a simplified version of the proc info func-

tion of drivers/usb/storage/scsiglue.c based on the ver-

sion just prior to the evolution, from Linux 2.5.70, and the

result of performing the above collateral evolutions in this

function. Similar collateral evolutions were performed in

Linux 2.5.71 in 19 SCSI driver files inside the kernel source

tree. The affected code, shown in italics, is:

The declaration of the variable hostptr, which is moved

from the function body (line 6) to the parameter list (line

1), to receive the new Scsi_Host-typed argument.

The call to scsi_host_hn_get, which is removed (line 8),

entailing the removal of the assignment of its return value

to hostptr. The subsequent null test on hostptr is dropped,

as the SCSI library is assumed to provide a non-null value.

The calls to scsi_host_put, which are removed. Because

the proc info function should call scsi_host_put whenever

scsi_host_hn_get has been called successfully (i.e., returns

a non-null value), there may be many such calls, one per

possible control-flow path. In this example, there are two:

one on line 13 just before an error return and one on line 18

in the normal exit path.

The proc_info collateral evolution using SmPL. Fig-

ure 2 shows a SmPL semantic patch describing these col-

lateral evolutions. Overall, the semantic patch has the form

of a traditional patch [14], consisting of a sequence of ruleseach of which begins with some context information delim-

ited by a pair of @@s and then specifies a transformation to

be applied in this context. In the case of a semantic patch,

the context information declares a set of metavariables. A

metavariable can match any term of the kind specified in its

declaration (identifier, integer expression, etc.), such that

all references to a given metavariable match the same term.

The transformation is specified as in a traditional patch file,

as a term having the form of the code to be transformed.

This term is annotated with the modifiers - and + to in-

dicate code that is to be removed and added, respectively.

The semantic patch in Figure 2 consists of two rules: rule1

(lines 1-5) finds the name of the proc info function in the

given SCSI driver file, and rule2 (lines 7-24) specifies the

transformation that should be applied to the definition of

that function.

Rule1 declares two metavariables (lines 1-4): ops and

proc info func. The metavariable ops is declared as an ar-

bitrary expression of type struct SHT, and represents the

structure that a SCSI driver passes to the SCSI library

to identify the driver’s callback functions. The metavari-

able proc info func is declared as an identifier and repre-

sents the name of the proc info function. The rest of this

rule (line 5) simply identifies the proc info function as the

function that is stored in the proc_info field of the struc-

ture ops. For example, the scsiglue driver stores the func-

tion usb_storage_proc_info in this field (code not shown).

Rule1 can match any number of times in a given driver file.

The rest of the semantic patch will be applied once for each

possible distinct set of bindings of the metavariables.

Rule2 declares metavariables to represent the parameters

of the proc info function and the Scsi_Host-typed local vari-

able (lines 7-10). The metavariable proc info func is speci-

fied to be inherited from rule1. The rest of the rule (lines

11 to 24) specifies the removal of the various code fragments

outlined above from the function body. Because the code

to remove is not necessarily contiguous, these fragments are

separated by the SmPL operator “...”, which matches any

sequence of instructions. The semantic patch also specifies

that a line should be added: the declaration specified in line

16 to be removed from the function body is specified to be

added to the parameter list in line 12 by a repeated refer-

ence to the hostptr metavariable. These transformations

only apply if the rule matches completely.

Abstracting away from syntactic variations. A seman-

tic patch applies independent of spacing, line breaks, and

the presence of comments. For example, the semantic patch

of Figure 2 matches the code of Figure 1(a), even though

the opening brace of the function definition is on the same

line as the function header in the semantic patch (line 14 of

Figure 2) and on a different line in the code (line 4 of of Fig-

ure 1(a)). Moreover, the Coccinelle transformation engine

is parametrized by a collection of isomorphisms specifying

sets of equivalences that are taken into account when ap-

plying the transformation rule. Isomorphisms can be spec-

ified in an auxiliary file by the SmPL programmer, using a

variant of the SmPL syntax. Among the default set of iso-

morphisms is the property that for any x that has pointer

type, !x, x == NULL, and NULL == x are equivalent. This

isomorphism is specified as follows:

@@ expression *X; @@

X == NULL <=> !X <=> NULL == X

Given this specification, the pattern on line 20 of Figure 2

matches a conditional that tests hostptr using any of the

listed variants. In rule1, isomorphisms account for the var-

ious ways of initializing a structure field (line 5), i.e., via the

structure itself, via a pointer to the structure, or using the

C99 initializer syntax. In rule2, isomorphisms also allow

250

Context

Addition

Removal

Any fragment of CFG Linux drivers

Page 96: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

[PadioleauLHM08] Semantic patches

the local variable hostptr to be initialized to a constant at

the point of its declaration (line 16) and allow the braces

around the then branch of the test of the result of calling

scsi_host_hn_get to be omitted (line 20). All of these iso-

morphisms are exploited in transforming the relevant drivers

in the Linux kernel source tree. Our default set of isomor-

phisms contains 39 equivalences commonly relevant to driver

code.

.

.

.��

hostptr = scsi host hn get(hostno);

��if (!hostptr)

��������������������������

��{

��

us = hostptr→...;

����return -ESRCH;

��

if(!us)

�� ��}

��

{

��

SPRINTF(...);

��scsi host put(hostptr);

��

scsi host put(hostptr);

��return -ESRCH;

��

return length;

��

}

��}

Figure 3: CFG for Figure 1, lines 9-20 (a)

Abstracting away from control-flow variations. A de-

vice driver implements an automaton, testing various con-

ditions depending on the input received from the kernel and

the current state of the device, in order to determine an ap-

propriate action. Thus, a driver function is typically struc-

tured as a tree, with many exit points. An example is the

code in Figure 1a, which contains branches on lines 9 and

12. As the structure of this tree depends on device-specific

properties, it cannot be explicitly represented in a semantic

patch. Instead, the semantic patch describes the pattern of

runtime operations required by the library, which should be

the same across all drivers.

As an approximation of the pattern of operations per-

formed at run time by a driver, we use paths in the driver’s

control-flow graph (CFG). Thus, a sequence of terms in a

SmPL semantic patch matches a sequence of C-language

terms along such a path. In particular, the SmPL construct

“...” represents an arbitrary CFG path fragment. The need

for this approach is illustrated by the scsiglue proc info func-

tion of Figure 1a, for which the CFG is shown in Figure 3.

This function contains two calls to scsi_host_put, while

the semantic patch of Figure 2 contains only one. The right

side of the CFG of Figure 3 shows that there are two paths

that remain within the function after the test of hostptr,one represented by a solid line and one represented by a dot-

ted line. Considering the entire execution of the function,

each path consists of the function header, the declaration of

hostptr, the call to scsi_host_hn_get, the null test, and

its own call to scsi_host_put and close brace, as specified

by rule2 of the semantic patch. Thus, the semantic patch

matches this code. The semantic patch furthermore matches

parse C file

��

parse a SmPL rule

��expand isomorphisms

��translate to CFG

�������� translate to CTL

���������

match the CTL against the CFG

using a model-checking algorithm

��modify matched code

��unparse

��more rules

�� ��

��more rules

����

done��

Figure 4: The Coccinelle engine

proc info functions having any other number of branches and

a corresponding number of calls to scsi_host_put.

Other features. SmPL contains a number of other features

for matching other kinds of code patterns. These include

the ability to describe a disjunction of possible patterns to

be tried in order, the ability to specify code that should not

be present, and the ability to declare some parts of a pat-

tern to be optional. Semantic patches may furthermore use

almost all of C and many cpp constructs (#define, etc.),

and use metavariables abstracting over all kind of terms, in-

cluding identifiers, constants, expressions of various types,

statements, and parameter lists. All of these features are

illustrated in the examples in the appendix. These features

allow a semantic patch to match over and transform almost

any kinds of constructs, which is essential because of the

wide range of transformations required for collateral evolu-

tions.

4. THE COCCINELLE ENGINEIn our analysis of the collateral evolution problem, we

identified the need for an approach that fits with the habits

of Linux programmers, that can preserve coding style, that

can accommodate code variations, and that is efficient. The

language SmPL satisfies the first requirement, as it builds on

the Linux patch format. The challenge is then to implement

this language so as to satisfy the remaining requirements.

In this section, we present the Coccinelle transformation en-

gine, that takes as input a semantic patch and a driver, and

transforms the driver according to the semantic patch. Fig-

ure 4 shows the main steps performed by the engine. The

key points in its design are the strategies for (i) coping with

cpp, to be able to preserve the original coding style, (ii) ab-

stracting away from syntactic and control-flow variations,

and (iii) efficiently applying code transformations. The im-

plementation of Coccinelle amounts to over 60,000 lines of

OCaml code.

Coping with cpp. Because collateral evolutions are just one

step in the ongoing maintenance of a Linux device driver, the

Coccinelle engine must generate code that is readable and

in the style of the original source code, including the use

251

Challenge: applying [PadioleauLHM08] to an OO language and to an OO API.

Page 97: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

[NitaN10] M. Nita and D. Notkin: Using Twinning to Adapt Programs to Alternative APIs. ICSE 2010.

!!""#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!"#$%&'#(()$*%!!!!&!'!"!$!#!%!(!$!)!!!!!!!!!!!!!!!!!!!$+,("#-,',$.%!!!!!!!!"!$!#!%!(!$!)!*!!!!&!!!+!!%%%%%%!!!!!!!!!!!!!!$")/.&01&+,("#-,',$./%

#!""#!!!&!#,!"!&!!!!!!!!!!!!!!!!!!!!!!!!!!!!$10+'#"/%$!"!2#3#&',.405&!05),/"!"!2#3#&.6(,/&!"!2#3#&)5,$.)1),+/

Figure 2: Abstract syntax of the mapping.

ure 1(a) to Figure 1(c) — when one declaration is changed

from A to C, the program may break arbitrarily badly. The

program is likely to not compile until the programmer com-

pletes the whole translation.

3. THE MAPPINGUnderlying our approach is the observation that in the

process of manual adaptation — either shallow or deep —

the programmer implicitly specifies code-level correspondences

between the APIs. For example, the programmer replaces

of type Enumeration with type Iterator and of Enumera-tion.nextElement() with Iterator. next().

With twinning, the developer writes these differences sep-

arately, in the form of a mapping whose entries specify when

two different code snippets correspond to one another. From

the mapping, we can automate key aspects of both shallow

and deep adaptation. The mapping that helps us generate

shallow and deep adapters for the code in Figure 1(a) is

shown in Figure 3.

When designing the mapping representation, we balanced

three inter-related principles:

1. Ease of use. The mapping language should resemble

the programming language itself as closely as possible.

2. Generality. The mapping representation should allow

a reasonable set of API-to-API mappings to be ex-

pressed.

3. Tool utility. Mappings should be easy to leverage by

matching algorithms and other analyses.

As these principles stand in tension, our initial design rep-

resents a sweet spot that addressed each of them to a rea-

sonable extent.

Figure 2 shows the abstract syntax of the mapping, which

we restricted, for now, to the specification of single replace-

ments (e.g., relating only two APIs). A mapping M is a

(possibly empty) list of replacements. A replacement

[ T1 ( F1 ) { B1 }T2 ( F2 ) { B2 } ]

consists of two nameless procedures, each of which has a

return type T, a list of formals F, and a body B. The replace-

ment specifies that T1 is related to T2, that the list of formals

F1 is point-wise related to the list of formals F2, and that

B1 is related to B2. Replacements are akin to the entries of

a logical relation, which relates programs that take related

inputs to related outputs. In the rest of the paper, we refer

to the domain and range of the mapping in the intuitive

sense: above, the snippet B1 is in the domain and B2 in the

range of the mapping.

!"!"#$%&""""""#$""""""""""""""%"&"$'&("(")"&'()*+#$,"-""*&&+,-./$"""#$""""""""""""""%"&"$'&("(")".++/0123)#$,"-"""4

!"0('1"&+$.%("#!"#$%&"5$""""""%"&"$'&("56'7'8'9)3#$,"-""2$"&+$%&""""#*&&+,-./$"/$"""%"&"$'&("/62)'+/)*+#$,"-""""""4

!"3%%4"+("""""#0('1"&+$.%("'$"%"&"$'&("'6:/3;*+'<7'8'9)3#$,"-""3%%4"+("""""#2$"&+$%&"2$""""%"&"$'&("26:/3='>)#$,"-"""""""4

!"536"#$""""""#0('1"&+$.%("'$"%"&"$'&("'69'>)<7'8'9)#$,"-""536"#$""""""#2$"&+$%&"2$""""%"&"$'&("269'>)#$,"-""""""""""4

Figure 3: Mapping for Figure 1.

When applying the mapping to a program (see Section 4),

occurrences of B1 are replaced by occurrences of B2. The

purpose of the return type and the list of formals is two-

fold. First, they are used in type-checking the mapping.

Therefore, replacements must be closed programs in which

all free variables in B are closed by F. Second, they can be

used by mapping-based transformations. When applying a

mapping to a program, for example, type matching allows

identification of code to translate. Otherwise, x.foo() and

y.foo() would match despite the types of x and y being

different.

A well-formed replacement is type-correct and its argu-

ment lists are in pointwise correspondence. That is, the

argument lists of the replacee and the replacer are equal in

length and the ith argument in one list corresponds to the

ith argument in the other. For API alternatives, this re-

striction means that we do not generally handle APIs where

the functionality of one type in one API is spread out over

the functionality of two or more types in the other. This re-

striction simplifies our algorithms significantly, and we have

yet to find real and useful examples where removing it is

absolutely necessary.

Figure 3 shows the mapping that handles the example

in Figure 1. The mapping syntax does not mention the

names used by the analysis that generates the adapters in

Figure 1(d-f): Seq, Iter, getIter, hasMore, and getNext.Names can easily be incorporated by (a) attaching a name to

each replacement and (b) adding a syntactic form that maps

two related type names to a new name, e.g. name Seq {Ar-rayList, Vector}. Alternatively, names could potentially

be automatically-generated using a heuristic algorithm that

yields readable names for corresponding blocks and corre-

sponding types.

We believe that this representation approximates our de-

sign principles. It is natural to use: programmers specify

correspondences as pairs of Java methods. In particular,

they can copy and paste snippets from the program and

then specify their replacements. It is general : it allows code

correspondences to be defined as pairs of method bodies. It

is useful : we defined shallow and deep adaptation transfor-

mations based on it.

4. SHALLOW ADAPTATIONA program transformation

ShallowAdapt(P,M) = P’

generates a variant program P’ from a program P and the

mapping M. The transformation ShallowAdapt walks the pro-

gram’s abstract syntax tree (AST) and attempts to apply

each replacement to each node. When encountering a block,

!!""#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!"#$%&'#(()$*%!!!!&!'!"!$!#!%!(!$!)!!!!!!!!!!!!!!!!!!!$+,("#-,',$.%!!!!!!!!"!$!#!%!(!$!)!*!!!!&!!!+!!%%%%%%!!!!!!!!!!!!!!$")/.&01&+,("#-,',$./%

#!""#!!!&!#,!"!&!!!!!!!!!!!!!!!!!!!!!!!!!!!!$10+'#"/%$!"!2#3#&',.405&!05),/"!"!2#3#&.6(,/&!"!2#3#&)5,$.)1),+/

Figure 2: Abstract syntax of the mapping.

ure 1(a) to Figure 1(c) — when one declaration is changed

from A to C, the program may break arbitrarily badly. The

program is likely to not compile until the programmer com-

pletes the whole translation.

3. THE MAPPINGUnderlying our approach is the observation that in the

process of manual adaptation — either shallow or deep —

the programmer implicitly specifies code-level correspondences

between the APIs. For example, the programmer replaces

of type Enumeration with type Iterator and of Enumera-tion.nextElement() with Iterator. next().

With twinning, the developer writes these differences sep-

arately, in the form of a mapping whose entries specify when

two different code snippets correspond to one another. From

the mapping, we can automate key aspects of both shallow

and deep adaptation. The mapping that helps us generate

shallow and deep adapters for the code in Figure 1(a) is

shown in Figure 3.

When designing the mapping representation, we balanced

three inter-related principles:

1. Ease of use. The mapping language should resemble

the programming language itself as closely as possible.

2. Generality. The mapping representation should allow

a reasonable set of API-to-API mappings to be ex-

pressed.

3. Tool utility. Mappings should be easy to leverage by

matching algorithms and other analyses.

As these principles stand in tension, our initial design rep-

resents a sweet spot that addressed each of them to a rea-

sonable extent.

Figure 2 shows the abstract syntax of the mapping, which

we restricted, for now, to the specification of single replace-

ments (e.g., relating only two APIs). A mapping M is a

(possibly empty) list of replacements. A replacement

[ T1 ( F1 ) { B1 }T2 ( F2 ) { B2 } ]

consists of two nameless procedures, each of which has a

return type T, a list of formals F, and a body B. The replace-

ment specifies that T1 is related to T2, that the list of formals

F1 is point-wise related to the list of formals F2, and that

B1 is related to B2. Replacements are akin to the entries of

a logical relation, which relates programs that take related

inputs to related outputs. In the rest of the paper, we refer

to the domain and range of the mapping in the intuitive

sense: above, the snippet B1 is in the domain and B2 in the

range of the mapping.

!"!"#$%&""""""#$""""""""""""""%"&"$'&("(")"&'()*+#$,"-""*&&+,-./$"""#$""""""""""""""%"&"$'&("(")".++/0123)#$,"-"""4

!"0('1"&+$.%("#!"#$%&"5$""""""%"&"$'&("56'7'8'9)3#$,"-""2$"&+$%&""""#*&&+,-./$"/$"""%"&"$'&("/62)'+/)*+#$,"-""""""4

!"3%%4"+("""""#0('1"&+$.%("'$"%"&"$'&("'6:/3;*+'<7'8'9)3#$,"-""3%%4"+("""""#2$"&+$%&"2$""""%"&"$'&("26:/3='>)#$,"-"""""""4

!"536"#$""""""#0('1"&+$.%("'$"%"&"$'&("'69'>)<7'8'9)#$,"-""536"#$""""""#2$"&+$%&"2$""""%"&"$'&("269'>)#$,"-""""""""""4

Figure 3: Mapping for Figure 1.

When applying the mapping to a program (see Section 4),

occurrences of B1 are replaced by occurrences of B2. The

purpose of the return type and the list of formals is two-

fold. First, they are used in type-checking the mapping.

Therefore, replacements must be closed programs in which

all free variables in B are closed by F. Second, they can be

used by mapping-based transformations. When applying a

mapping to a program, for example, type matching allows

identification of code to translate. Otherwise, x.foo() and

y.foo() would match despite the types of x and y being

different.

A well-formed replacement is type-correct and its argu-

ment lists are in pointwise correspondence. That is, the

argument lists of the replacee and the replacer are equal in

length and the ith argument in one list corresponds to the

ith argument in the other. For API alternatives, this re-

striction means that we do not generally handle APIs where

the functionality of one type in one API is spread out over

the functionality of two or more types in the other. This re-

striction simplifies our algorithms significantly, and we have

yet to find real and useful examples where removing it is

absolutely necessary.

Figure 3 shows the mapping that handles the example

in Figure 1. The mapping syntax does not mention the

names used by the analysis that generates the adapters in

Figure 1(d-f): Seq, Iter, getIter, hasMore, and getNext.Names can easily be incorporated by (a) attaching a name to

each replacement and (b) adding a syntactic form that maps

two related type names to a new name, e.g. name Seq {Ar-rayList, Vector}. Alternatively, names could potentially

be automatically-generated using a heuristic algorithm that

yields readable names for corresponding blocks and corre-

sponding types.

We believe that this representation approximates our de-

sign principles. It is natural to use: programmers specify

correspondences as pairs of Java methods. In particular,

they can copy and paste snippets from the program and

then specify their replacements. It is general : it allows code

correspondences to be defined as pairs of method bodies. It

is useful : we defined shallow and deep adaptation transfor-

mations based on it.

4. SHALLOW ADAPTATIONA program transformation

ShallowAdapt(P,M) = P’

generates a variant program P’ from a program P and the

mapping M. The transformation ShallowAdapt walks the pro-

gram’s abstract syntax tree (AST) and attempts to apply

each replacement to each node. When encountering a block,

Mapping

Syntax

Page 98: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

• Two forms of adaptation

‣ Shallow adaptation: apply mapping directly

‣ Deep adaptation: go through mapping-defined API

• Limitations

‣ No formal definition available

‣ No use of control-flow/data-flow analysis

‣ Validation for some subset of Twitter/Facebook APIs

[NitaN10] M. Nita and D. Notkin: Using Twinning to Adapt Programs to Alternative APIs. ICSE 2010.

Social network domain

Page 99: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Call to arms on API migration

Page 100: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

Call to arms

• Name- and type-based API matching

• Flow-aware patterns

• Sound use of non-invariant type mappings

• IDE support

• Incremental development

• Test-driven development

• End-to-end validation

• Generalization beyond API migration

• Uniformity across wrapping and rewriting

Page 101: API migration is a hard problem! - Uni Koblenz-Landaulaemmel/1203-ames.pdf · 2]; the wrapper must neutralize these differences. Our example concerns two GUI APIs: i) the Swing API

© 2009-2012 Ralf Lämmel, Software Languages Team and collaborators; see 1st page of slide deck

That’s it.Thanks!

http://professor-fish.blogspot.com/2012/03/should-i-declare-defeat-on-research.html

http://www.ece.iastate.edu/seminars-and-events/api-migration-is-a-hard-problem/

http://softlang.wikidot.com/api