multi-dimensional exploration of api usage - icpc13 - 21-05-13

16
Multi-dimensional Exploration of API Usage Coen De Roover 1 , Ralf Lämmel 2 , Ekaterina Pek 3 1 Software Languages Lab, Vrije Universiteit Brussel, Belgium 2 Software Languages Team, University of Koblenz-Landau, Germany 3 ADAPT Lab, University of Koblenz-Landau, Germany

Upload: coen-de-roover

Post on 03-Jul-2015

2.178 views

Category:

Technology


0 download

DESCRIPTION

Presented at the 21st IEEE International Conference on Program Comprehension (ICPC 2013), San Francisco (USA). Website of the paper: http://softlang.uni-koblenz.de/explore-API-usage/

TRANSCRIPT

Page 1: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Multi-dimensional Exploration of API Usage

Coen De Roover1, Ralf Lämmel2, Ekaterina Pek3

1 Software Languages Lab, Vrije Universiteit Brussel, Belgium2 Software Languages Team, University of Koblenz-Landau, Germany

3 ADAPT Lab, University of Koblenz-Landau, Germany

Page 2: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Exploration Story: JHotDraw

➜ relatively few references to SAX and DOM

what XML APIs are used and how extensively?

Swing!!java.lang!!JavaBeans!!java.io!!AWT!!java.util

Package org.jhotdraw.undo

AWT!!Swing!!java.io java.lang java.util

JavaBeans java.text java.lang.reflect!!DOM!!java.net java.util.regex!!Java Print Service!!java.util.zip!!java.lang.annotation java.math java.lang.ref java.util.concurrent Java security!!javax.imageio!!SAX

JHotDraw’s API Cocktail

Fig. 8. The API Cocktail of JHotDraw (cloud of API tags).

View – A list as in the case of the API Footprint insight, exceptthat it is narrowed down to a sub-API of interest.Illustration – Fig. 7 illustrates ‘Non-trivial API’ usage forJDOM’s core package. The selection is concerned with aproject type which extends the API type DefaultJDOMFactoryto introduce a project-specific factory for XML elements.Basic IDE functionality could be used from here on to checkwhere the API-derived type is used.Intelligence – In the example, we explored non-trivial APIusage, such as type derivation at the boundary of project andAPI—knowing that it challenges API evolution and migra-tion [7]. More generally, developers are interested in specificsub-APIs, when they require detailed analysis for understand-ing. API developers (more likely than project developers)may be more aware of sub-APIs; they may, in fact, capturethem, as part of the exploration. (This is what we did duringthis research.) Such sub-API tagging, which is supported bythe Sub-API Footprint insight may ultimately improve APIdocumentation in ways that are complementary to existingapproaches [4], [5].

F. The API Cocktail Insight

Intent – Understand what APIs are used together in largerproject scopes.Stakeholder – Project developer.API Usage – All APIs.View – The listing of all APIs exercised in the project or aproject package with API-usage metrics applied to the APIs.Illustration – Remember the tree-based representation of theAPI cocktail for JHotDraw as shown in Fig. 1 in §II. Thesame cocktail of 20 APIs is shown as a tag cloud in Fig. 8.Scaling is based on the #ref metric.Intelligence – The cocktail lists and ranks APIs that are used inthe corresponding project scope. Thus, the cocktail proxies as ameasurement for system complexity, required developer skills,and foreseeable design and implementation challenges. APIusage is part of the software architecture, in the sense of “whatmakes it hard to change the software” and chances are thatAPI usage may cause some “software or API asbestos” [29].While a large cocktail may be acceptable and unavoidable fora complex project, the cocktail should be smaller for individualpackages in the interest of a modularized, evolvable system.

G. APIs Versus Domains

We can always use API domains in place of APIs toraise the level of abstraction. Thus, any insight that comparesAPIs may as well be applied to API domains. APIs areconcrete technologies while API domains are more abstract

GUI!!Data!!Basics!!IO!!Format!!Component!!Meta!!XML!!Distribution!!Parsing!!Control!!Math!!Output!!Security!!Concurrency

JHotDraw’s API Domain Cocktail

GUI!!Basics!!Component!!IO Package org.jhotdraw.undo

Project jhotdraw

Fig. 9. Cocktail of domains for JHotDraw.

Basics!!Distribution!!GUI!!IO!!Component

java.lang!!java.net!!Swing!!JavaBeans!!java.io!!

APIs

API domains

Coupling in JHotDrawfor the interface org.jhotdraw.app.View

Fig. 10. API Coupling for JHotDraw’s interface org.jhotdraw.app.View.

software concepts. Consider Fig. 9 for illustration. It showsAPI domains for all of JHotDraw and also for its undopackage. Thus, it presents the API cocktails of Fig. 8 in amore abstract manner.

H. The API Coupling InsightIntent – Understand what APIs or API domains are usedtogether in smaller project scopes.Stakeholder – Project developer.API Usage – All APIs.View – See §VI-F except APIs or domains are listed for smallerproject scopes.Illustration – Fig. 10 shows API Coupling for the interfaceorg.jhotdraw.app.View from the JHotDraw’s app package4.According to the documentation, the package “defines aframework for document-oriented applications and providesdefault implementations”. The View type “paints a documenton a JComponent within an Application”. (Application is themain type from the package which “handles the lifecycle ofviews and provides windows to present them on screen”.) Thecoupled use of APIs can be dissected in terms of the involvedtypes as follows:java.lang: trivial usage of strings.java.net: types for the location to save the view.JavaBeans: de-/registration of PropertyChangeListeners.java.io: exception handling for reading/writing views.Swing: usage of JComponent on which to paint a document; usageof ActionMap for actions on the GUI component.

Intelligence – Simultaneous presence of several domains orAPIs in a relatively small project scope may indicate acciden-tal complexity and poor separation of concerns. Thus, suchexploration may reveal a code smell [30], [31] that is worthaddressing. Alternatively, a dissection, as performed for theillustrative example, may help in understanding the design andreasonable API dependencies.

I. The API Profile InsightIntent – Understand what API facets are used in varyingproject scopes.Stakeholder – Project developer and, possibly, API developer.

4The lifecycle of the interface as explained by its documentation: http://www.randelshofer.ch/oop/jhotdraw/JavaDoc/org/jhotdraw/app/View.html

API cloudSwing!!java.lang!!JavaBeans!!java.io!!AWT!!java.util

Package org.jhotdraw.undo

AWT!!Swing!!java.io java.lang java.util

JavaBeans java.text java.lang.reflect!!DOM!!java.net java.util.regex!!Java Print Service!!java.util.zip!!java.lang.annotation java.math java.lang.ref java.util.concurrent Java security!!javax.imageio!!SAX

JHotDraw’s API Cocktail

Fig. 8. The API Cocktail of JHotDraw (cloud of API tags).

View – A list as in the case of the API Footprint insight, exceptthat it is narrowed down to a sub-API of interest.Illustration – Fig. 7 illustrates ‘Non-trivial API’ usage forJDOM’s core package. The selection is concerned with aproject type which extends the API type DefaultJDOMFactoryto introduce a project-specific factory for XML elements.Basic IDE functionality could be used from here on to checkwhere the API-derived type is used.Intelligence – In the example, we explored non-trivial APIusage, such as type derivation at the boundary of project andAPI—knowing that it challenges API evolution and migra-tion [7]. More generally, developers are interested in specificsub-APIs, when they require detailed analysis for understand-ing. API developers (more likely than project developers)may be more aware of sub-APIs; they may, in fact, capturethem, as part of the exploration. (This is what we did duringthis research.) Such sub-API tagging, which is supported bythe Sub-API Footprint insight may ultimately improve APIdocumentation in ways that are complementary to existingapproaches [4], [5].

F. The API Cocktail Insight

Intent – Understand what APIs are used together in largerproject scopes.Stakeholder – Project developer.API Usage – All APIs.View – The listing of all APIs exercised in the project or aproject package with API-usage metrics applied to the APIs.Illustration – Remember the tree-based representation of theAPI cocktail for JHotDraw as shown in Fig. 1 in §II. Thesame cocktail of 20 APIs is shown as a tag cloud in Fig. 8.Scaling is based on the #ref metric.Intelligence – The cocktail lists and ranks APIs that are used inthe corresponding project scope. Thus, the cocktail proxies as ameasurement for system complexity, required developer skills,and foreseeable design and implementation challenges. APIusage is part of the software architecture, in the sense of “whatmakes it hard to change the software” and chances are thatAPI usage may cause some “software or API asbestos” [29].While a large cocktail may be acceptable and unavoidable fora complex project, the cocktail should be smaller for individualpackages in the interest of a modularized, evolvable system.

G. APIs Versus Domains

We can always use API domains in place of APIs toraise the level of abstraction. Thus, any insight that comparesAPIs may as well be applied to API domains. APIs areconcrete technologies while API domains are more abstract

GUI!!Data!!Basics!!IO!!Format!!Component!!Meta!!XML!!Distribution!!Parsing!!Control!!Math!!Output!!Security!!Concurrency

JHotDraw’s API Domain Cocktail

GUI!!Basics!!Component!!IO Package org.jhotdraw.undo

Project jhotdraw

Fig. 9. Cocktail of domains for JHotDraw.

Basics!!Distribution!!GUI!!IO!!Component

java.lang!!java.net!!Swing!!JavaBeans!!java.io!!

APIs

API domains

Coupling in JHotDrawfor the interface org.jhotdraw.app.View

Fig. 10. API Coupling for JHotDraw’s interface org.jhotdraw.app.View.

software concepts. Consider Fig. 9 for illustration. It showsAPI domains for all of JHotDraw and also for its undopackage. Thus, it presents the API cocktails of Fig. 8 in amore abstract manner.

H. The API Coupling InsightIntent – Understand what APIs or API domains are usedtogether in smaller project scopes.Stakeholder – Project developer.API Usage – All APIs.View – See §VI-F except APIs or domains are listed for smallerproject scopes.Illustration – Fig. 10 shows API Coupling for the interfaceorg.jhotdraw.app.View from the JHotDraw’s app package4.According to the documentation, the package “defines aframework for document-oriented applications and providesdefault implementations”. The View type “paints a documenton a JComponent within an Application”. (Application is themain type from the package which “handles the lifecycle ofviews and provides windows to present them on screen”.) Thecoupled use of APIs can be dissected in terms of the involvedtypes as follows:java.lang: trivial usage of strings.java.net: types for the location to save the view.JavaBeans: de-/registration of PropertyChangeListeners.java.io: exception handling for reading/writing views.Swing: usage of JComponent on which to paint a document; usageof ActionMap for actions on the GUI component.

Intelligence – Simultaneous presence of several domains orAPIs in a relatively small project scope may indicate acciden-tal complexity and poor separation of concerns. Thus, suchexploration may reveal a code smell [30], [31] that is worthaddressing. Alternatively, a dissection, as performed for theillustrative example, may help in understanding the design andreasonable API dependencies.

I. The API Profile InsightIntent – Understand what API facets are used in varyingproject scopes.Stakeholder – Project developer and, possibly, API developer.

4The lifecycle of the interface as explained by its documentation: http://www.randelshofer.ch/oop/jhotdraw/JavaDoc/org/jhotdraw/app/View.html

API domain cloud

?

Let me start by making the concept of exploring API usage more concrete.Imagine you are a developer tasked with migrating JH from XML to JSON for persistency.The first thing you would like to know is what APIs for manipulating XML are used, and how extensively these APIs are used. You could gain these insights through the two tag clouds shown on the slide. The top one contains the domains of the APIs used by JH, the bottom one the actual APIs. The size of a tag corresponds to the amount of references to the API or the domain. So we can conclude that XML apis are used by JH, more concretely DOM and SAX, but not extensively. There are a lot more references to the AWT and SWING APIs from the GUI programming domain, for instance.

Page 3: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Exploration Story: JHotDraw

➜ footprint of DOM in JHotDraw is but 94 refs to 19 distinct elementswhat elements of DOM are actually used?

table of referenced API elements (i.e., DOM slice)

?

The next insight to gain is whether the project uses the complete DOM API, or just a small subset. Given a table of referenced API elements, the latter seems to be the case. There are only 94 references to 19 distinct types and methods. Even better news, no exotic API elements are used.

Page 4: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Exploration Story: JHotDraw

Slice of JHotDrawwith DOM usage

➜ local to 1/13 top-level packagesHow is DOM usage distributed across JHotDraw?

table of referencing project elements (i.e., JHotDraw slice)

?in the view of hundreds of API elements declared by thepublic void applyStylesTo(Element elem) {for (CSSRule rule : rules) {if (rule.matches(elem)) {rule.apply(elem);}}

}

usage.

All good news so far, but it could still be the case that the API is used all over the project. Luckily, given a table of referencing project elements, the use of DOM is local to 4 classes in the org.jhotdraw.xml package. Our exploration therefore shows that migrating from XML to JSON is feasible.

Page 5: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Exploring API Usage: Quaatlas API Atlas

API metadataAPI named collection of elements (98)API domain named collection of APIs addressing the same domain (27)API facet named collection of API elements addressing a particular concern

3) Understanding API Usage: Robillard and DeLine dis-covered in a field study on API learning obstacles that APIusers prefer to learn from patterns of related calls rather thanillustrations of individual methods [21]. Hou and Li reportsimilar obstacles based on an exploratory study of newsgroupdiscussions [22]. Generally, information about API usage maybe used in helping developers. Nasehi and Maurer show thatAPI unit tests can be used as usage examples [23]. Zhong etal. cluster API calls and mine patterns to recommend usefulcode snippets to API users [3]. Bruch et al. develop intelligentcode completion that narrows down the possible suggestionsto those API elements that are actually relevant [24]. Houet al. [25] compare different filtering, sorting and groupingstrategies to this end. The latter would, for instance, groupall methods related to managing the components of an AWTcontainer. Mandelin et al. present an approach for synthesizinga snippet to fill in a gap in the code using an API, givencertain contextual information [26]. Our effort differs in thatit enables navigating both projects and APIs in the familiarIDE-like manner with API usage in focus. We also identify acatalogue of exploration activities to perform.

IV. BASIC CONCEPTS

We set up the basic concepts underlying this paper: APIs,API usage, and API-usage metrics. We also augment the basicnotion of API with extra concepts for API domains and APIfacets to raise the level of abstraction in exploration.

APIs: We use the term API to refer to the actual interfacebut also to the underlying implementation. We do not payattention to any distinction between libraries and frameworks.We simply view an API as a set of types (classes, interfaces,etc.) referable by name and distributed together for use in soft-ware projects. Without loss of generality, this paper invokesJava for most illustrations and intuitions.

Indeed, we assume that package names, package prefixes,and types within packages can be used to describe APIs.For instance, the package prefix javax.swing (and possiblyothers) could be associated with the Swing API for GUIprogramming. It is important that we view javax.swing as apackage prefix because Swing is indeed organized in a packagetree. In contrast, the java.util API corresponds to all the typesin the package of ditto name. There are various sub-packagesof java.util, but they are preferably considered separate APIs.In fact, the java.util API deserves further breakdown becausethe package serves de facto distinct purposes, notably Java’scollections and Java’s event system. (This is not an uncommonsituation.) We use the term sub-API to refer to declared subsetsof the types in a given API.

Clearly, an API may exist in different versions, in whichcase it needs to be decided whether or not the versions shouldbe treated like different APIs, as far as API-usage analysis isconcerned.

API Usage: We are concerned with API usage in givensoftware projects. API usage is evidenced from any sortof reference from projects to APIs. References are directlyassociated with syntactical patterns in the code of the projects,

e.g., a method call in a class of a project that invokes a methodof an API type, or a class declaration in a project that explicitlyextends a class of an API. The resulting patterns can hencebe used to classify API references and to control explorationwith regard to the kinds of references to present to users.

A reasonably precise analysis of API usage requires that theunderlying projects are ‘resolved’ in that each API referencein a project can be followed to the corresponding declarationin the API. Further, since exploration of API usage relies onthe developer’s view on source code of projects, we effectivelyneed compilable source code of all projects.

API-usage Metrics: For quantifying API usage, metricsare needed that can be used in exploration views in differentways, e.g., for ordering (elements or scopes of APIs orprojects) or for scaling in the visualization of API usage. Forthe purpose of this paper, the following metrics suffice:#proj: Number of projects referencing APIs.#api: Number of APIs being referenced.#ref: Number of references from projects to APIs.#elem: Number of API elements being referenced.#derive: Number of project types derived from API types.#super: Number of API types serving as supertype for derivations.#sub: Number of project types serving as subtype for derivations.

These metrics can be applied, of course, to different selec-tions of projects or APIs as well as specific packages, types,or methods thereof. For instance, we may be interested in #api

for a specific project. Also, we may be interested in #ref forsome part of an API.

Further, these metrics can be configured to count onlyspecific patterns. It is easy to see now that the given metricsare not even orthogonal because, for example, #derive can beobtained from #ref by only counting patterns for ‘extends’ and‘implements’ relationships.

API Domains: We assume that each API addresses someprogramming domain such as XML processing or GUI pro-gramming. We are not aware of any general, widely adoptedattempt to associate APIs with domains, but the idea appearsto merit further research. We have begun collecting program-ming domains (or in fact, API domains) and tagging APIsappropriately. Let us list a few API domains and associatethem with well-known Java APIs:GUI: GUI programming, e.g., Swing and AWT.XML: XML processing, e.g., DOM, JDOM, and SAX.Data: Data structures incl. containers, e.g., java.util.IO: File- and stream-based I/O, e.g., java.io and java.nio.Component: Component-oriented programming, e.g., JavaBeans.Meta: Meta-programming incl. reflection, e.g., java.lang.reflect.Basics: Basic language support, e.g., java.lang.String.

API domains are helpful in reporting API usage and quan-tifying API usage of interest in more abstract terms than thenames of individual APIs, as will be illustrated in §VI.

API Facets: An API may contain dozens or hundredsof types each of which has many method members in turn.Some APIs use sub-packages to organize such API complexity,but those sub-packages are typically concerned with advancedAPI usage whereas the core facets of API usage are notdistinguished in any operational manner. This makes it hardto understand API usage at a somewhat abstract level.

1. input : corpus, candidateList

2. output : corpus

3. for each name in candidateList :

4. (p

src

, p

bin

) = obtainProject(name);

5. patches = exploratoryBuild(p

src

, p

bin

);

6. timestamp = build(p

src

, patches);

7. (java, classes, jars) = collectStats(p

src

);

8. java

0= filter(java);

9. (jars

built

, jars

lib

) = detectJars(timestamp, java

0, jars);

10. java

0compiled

= detectJava(timestamp, java

0, classes, jars

built

);

11. p

0src

= (java

0compiled

, jars

lib

);

12. p

0bin

= jars

built

;

13. p

0= (p

0src

, p

0bin

);

14. if validate(p

0) : corpus = corpus + p

0;

Fig. 4. Pseudocode describing the corpus (re)-engineering method.

Accordingly, we propose leveraging a notion of API facetsin the sense of aspects or concerns supported by the API.In this paper, we assume that facets are represented as namedcollections of specific API types or methods. As an illustration,we name a few API facets of the typical DOM-like API suchas DOM itself, JDOM, or dom4j:Input / Output: De-/serialization for DOM trees.Observation: Getter-like access and other ‘read only’ forms.Addition: Addition of nodes et al. as part also of construction.Removal: Removal of nodes et al. as a form of mutation.Namespaces: XML namespace manipulation.Nontrivial XML: Use of CDATA, PI, and other XML idiosyncrasies.Nontrivial API: Usage of types and methods that are beyond normalAPI usage. For instance, XML APIs may provide some frameworkfor node factories or adapters for API integration.

API facets are helpful in communicating API usage to theuser at a more abstract level than the level of individualtypes and methods, as will be illustrated in §VI. We leverageknowledge of the APIs to identify (to name) API facets and totag APIs appropriately. The idea of grouping API members,e.g., by their functional roles, has also been studied in relatedwork on code completion; see §III.

V. THE QUAATLAS CORPUS FOR API-USAGE ANALYSIS

Our study requires a suitable corpus of mature, well-developed projects coming from different application domains.Arguably, such projects show sufficient and advanced APIusage. We decided to restrict ourselves to open-source Javaprojects; in order to increase quality and reproducibility of ourresearch, we decided to use an existing, established and cu-rated, collection of Java projects—the QUALITAS corpus [27],release 20101126r. As we discuss in §IV, API usage entailsthe ability to resolve types. However, QUALITAS does notguarantee the availability of a project’s library types. Thecollection consists of source and binary forms as they areprovided by the project developers.

In the interest of similar research tasks that require adependency-resolved corpus, we detail our method for corpus(re-)engineering. The resulting dependency-resolved QUALI-TAS variant is available on the paper’s website.

A. MethodThe pseudocode depicted in Fig. 4 describes our corpus (re)-

engineering method. The input is a (possibly empty) corpus to

be extended and a list of candidate projects, candidateList , tobe added to it. The output is the corpus populated with refinedprojects.

Line 4 assumes that a project can be obtained both in itssource and binary forms (e.g., downloading them from theproject website). During an exploratory build (line 5), thenature of the project is manually investigated by an expert.The expert investigates how the project is built, what errorsoccur during the build (if any), and how to patch them. At thisstage, we also compare the set of built JARs with the JARsin the binary distribution form of the project. If the formerset is smaller than the latter (e.g., because default targets inbuild scripts may be insufficient and a series of target callsor invocation of several build scripts is needed), we attemptto push the build of the project for completeness. Once theexploratory build is successful, we are able to automaticallybuild the project (line 6), if necessary after applying patches.

After the build, we collect the full path, creation andmodification times of each file in the project (line 7). ForJava files we extract qualified names of contained top-leveltypes, for class files we detect their qualified names. For JARswe explore their contents and collect information about thecontained class files.

On line 8, we apply a filter, keeping only the source codethat we consider to be both system and core (see §V-C). Online 9, we use the known start time of the build together withinformation about Java types computed on lines 7 and 8 toclassify the JARs found after the build either as library JARsor as built JARs. On line 10, we use the identified built JARsand the compiled class files to identify Java types that werecompiled during the build. On line 11, we refine the project’ssource code form p

0src

to include only the compiled Javatypes together with the necessary library JARs. On line 12,the binary form p

0bin

is refined to consist of the built JARs.The refined project p

0 (line 13) is validated (line 14) byrebuilding the project in a sandbox, outside its specific setup,making sure to use only those files that have been identified bythe method.2 A successful sandbox build indicates that sourceand library files have been discerned correctly. In that case,we add the refined project to the corpus (line 14).

This pseudocode is, of course, an idealized description ofthe process. In practice, we would execute line 4 only onceper project; line 5 could be repeated several times, if the buildcoverage is found unsatisfactory in terms of compiled types—something that becomes clear only on line 9. We treat lines 6–10 as an atomic action, call it a “corpus build,” and performit on regular basis.

B. Exploratory Builds

The QUALITAS corpus contains 106 projects. We were ableto build 86 projects, of which 54 required a patch to build. Welimit our exploratory build efforts to approximately 2-3 hoursper project.

2In practice, we use an Eclipse workspace with automatically generatedconfiguration files (i.e., .system and .classpath).

gathered by studying API usage in a corpus of projectsre-engineered Qualitas corpus to Eclipse projects that compile (79)dependencies resolved and separated from project files

In the paper, we present a similar exploration-based approach for understanding API usage. This approach relies on a lot of meta-data about APIs that we have made available in an API atlas.For 98 APIs, this atlas describes the individual packages/types/methods the API consists of. A fine-grained description is necessary as libraries such as Google Guava or even java.util group different APIs together. We also associated a domain with each API. This resulted in 27 API domains. Finally, we have started describing groups of elements within an API that address a particular concern. We gathered this meta-data by studying the APIs used in a corpus of 79 mature projects. We re-engineered the projects from the Qualitas corpus such that all their dependencies are resolved and separated from project files. This enables extracting precise API usage facts.

Page 6: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

linked to 101

Note that the entire API atlas is available on the paper’s website.There, we also present the meta-data in a human-readable format. One nice feature there is that each API is linked to its description on the 101companies wiki where you can also browse through small example programs that use the API etc.

Page 7: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Exploring API Usage: Exapus Platform

scaled and ordered by usage metrics: #ref, #elem, #derive, #proj, #api, ...

computes exploration views on usage factsselection of API references

organized as project or API sliceproject members + outgoing refs within their scopeAPI members + incoming refs within their scope

rendered as graph, table or cloud

by referenced elements: API name, element, meta-data ...by referencing elements: project name, element, syntactic pattern, ...

gathers API usage facts for a given corpusreferenced element, referencing scope, syntactic pattern (e.g., super call)

The actual exploration-based approach to understanding API usage is supported by a tool that extracts references to API elements from a single project or a corpus of projects. During an AST visit, the tool records for each reference it discovers the referenced element, the project scope in which this reference resides, and the syntactic form of the reference. This could be a method return type, a super call, or a type parameter, ..The tool presents exploration views on the extracted facts, which can be configured along several dimensions. First of all, you can configure what API references to include in a view using conditions on the referenced element and the referencing element. For instance, only the exceptions defined by an API from the XML domain that are caught in the JH project. Next, you can choose to organize these references as a slice of project members with outgoing refs or as a slice of API members with incoming refs. Finally, you can have these slices rendered as a graph/table/cloud scaled by a usage metric. For instance, a tag cloud scaled by the amount of subclassing along the border between a project and an API.

Page 8: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

What follows are some screenshots of the tool in action. At the far left, there is a list of predefined views. Their configuration can be edited in the top-right corner. Shown here is the configuration of a view that results in the tag cloud we saw earlier. At the top, you can select what referenced elements to include. Here, we include all of them using a wildcard pattern. At the bottom, you can select what referencing elements to include in a view. Here, we only include references from the JH project.

Note that even though the tool has a dynamic IDE-like feel, it is actually completely web-based. We hope this will encourage others to explore and augment our API meta-data.

Page 9: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Here you see a project-centric table of outgoing references from JH to the Java collections API and DOM. We see for instance that the method add of StyleManager invokes method add of java.util.List. At the bottom-left, you see a tag cloud for the currently selected project element. We see that there are more references to data APIs than to XML apis in the StyleManager class.The source code for this class is shown at the bottom-right. API references are highlighted within the source code.

Page 10: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Finally, here you see an API-centric graph of references from JH to the APIs known to us. Nodes are APIs. Borders of the nodes are scaled by the relative amount of referenced elements. So this is basically another rendering of the tag cloud you saw earlier. You could also choose to scale the borders of the nodes using a different metric, such as the amount of derivation that happens.

Page 11: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

And of course, we also made this tool publicly available.

Page 12: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Insight: API Dispersionintent

stakeholderview

intelligence

understand and compare dispersion of an API across the corpusAPI developer

project-centric tableusage metrics for quantitative comparisonAPI facets for qualitative comparison

Fig. 5. JDOM’s API Dispersion in QUAATLAS (project-centric table).

B. The API Dispersion Insight

Intent – Understand an API’s dispersion in a corpus by com-paring API usage across the projects in the corpus.Stakeholder – API developer.API Usage – One API.View – The listing of projects with associated API-usage met-rics for quantitative comparison and API facets for qualitativecomparison.Illustration – Fig. 5 summarizes JDOM’s dispersion quantita-tively in QUAATLAS. 6 projects in the corpus exercise JDOM.The projects are ordered by the #ref metric with the othermetrics not aligning. Only 2 projects (jspwiki and velocity)exercise type derivation at the boundary of API and project.Intelligence – The insight is about the significance of APIusage across corpus. In the figure, arguably, project jspwikishows the most significant API usage because it references themost API elements. Project jmeter shows the least significantAPI usage. Observation of significance helps an API developerin picking hard and easy projects for compliance testing alongAPI evolution—an easy one to get started; a hard one fora solid proof of concept. For instance, development of awrapper-based API re-implementation for API migration relieson suitable ‘test projects’ just like that [6], [7].

C. The API Distribution Insight

Intent – Understand API distribution across project scopes.Stakeholder – Project developer.API Usage – One API.View – The hierarchical breakdown of the project scopes withassociated API-usage metrics for quantitative comparison andAPI facets for qualitative comparison.Illustration – Remember JHotDraw’s slice of DOM usage inFig. 2 in §II. This view was suitable for efficient explorationof project scopes that directly depend on DOM.Intelligence – The insight may help a developer to decide onthe feasibility of an API migration, as we discussed in §II.

D. The API Footprint Insight

Intent – Understand what API elements are used in a corpusor varying project scopes.Stakeholder – Project developer and API developer.API Usage – One API.View – The listing of used API packages, types, and methods.

Fig. 6. JDOM’s API Footprint in QUAATLAS (API-centric table).

Nontrivial JDOM API usage in velocityorg.apache.velocity.anakia.AnakiaJDOMFactory

Scope Tags incl. facets #proj

...Fig. 7. ‘Non-trivial API’ usage for package org.jdom in QUAATLAS.

Illustration – Remember the tree-based representation of theAPI footprint for JHotDraw as shown in Fig. 3 in §II. Ina similar manner, while using a table-based representation,Fig. 6 summarizes JDOM usage across QUAATLAS. AllJDOM packages are listed. The core package is heavily usedand thus the listing is further refined to show details per APItype. Ordering relies on the #ref metric. Clearly, there is littleusage of API elements outside the core package.Intelligence – Overall, the footprint describes the (smaller)‘actual’ API that needs to be understood as opposed to thefull (‘official’) API. For instance, many APIs enable nontrivial,framework-like usage [1], [28], but in the absence of actualframework-like usage, the project developer may entertaina much simpler view on the API. In the context of APIevolution, an API developer consults an API’s footprint tominimize changes that break actual usage or to make an impactanalysis for changes. In the context of wrapper-based APIre-implementation for API migration, an API developer or aproject developer (who develops a project-specific wrapper)uses the footprint to limit the effort [6], [7].

E. The Sub-API Footprint Insight

Intent – Understand usage of a sub-API in a corpus or project.Stakeholder – API developer and, possibly, project developer.API Usage – One API.

choose compliance tests for API evolution

So, what insights about API usage can one hope to gain through such a tool? And how should you configure the tool such that it produces the right view for each insight? In the paper, we discuss this in a structured manner for several API usage insights.

The one shown here is concerned with how dispersed or widespread an API is across a corpus of projects. It can be gained by configuring the tool to produce a table of referencing project elements, together with some usage metrics. Here, we see JDOM’s dispersion in the corpus. The table is sorted by the amount of references each project contains. We see that the informa project has the most references, but that jspwiki references the most distinct API elements. We also see that this project is one of the few that contain subtypes of API elements. So who could benefit from this insight? This would be the developer of an API that needs to choose easy and difficult projects for compliance testing after an API evolution.

Page 13: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Insight: API Footprintintent

stakeholder

view

intelligence

understand what API elements are actually used in a corpus or in specific project scopes

API or project developer

API-centric table or treeordered or scaled by #ref

Fig. 5. JDOM’s API Dispersion in QUAATLAS (project-centric table).

B. The API Dispersion Insight

Intent – Understand an API’s dispersion in a corpus by com-paring API usage across the projects in the corpus.Stakeholder – API developer.API Usage – One API.View – The listing of projects with associated API-usage met-rics for quantitative comparison and API facets for qualitativecomparison.Illustration – Fig. 5 summarizes JDOM’s dispersion quantita-tively in QUAATLAS. 6 projects in the corpus exercise JDOM.The projects are ordered by the #ref metric with the othermetrics not aligning. Only 2 projects (jspwiki and velocity)exercise type derivation at the boundary of API and project.Intelligence – The insight is about the significance of APIusage across corpus. In the figure, arguably, project jspwikishows the most significant API usage because it references themost API elements. Project jmeter shows the least significantAPI usage. Observation of significance helps an API developerin picking hard and easy projects for compliance testing alongAPI evolution—an easy one to get started; a hard one fora solid proof of concept. For instance, development of awrapper-based API re-implementation for API migration relieson suitable ‘test projects’ just like that [6], [7].

C. The API Distribution Insight

Intent – Understand API distribution across project scopes.Stakeholder – Project developer.API Usage – One API.View – The hierarchical breakdown of the project scopes withassociated API-usage metrics for quantitative comparison andAPI facets for qualitative comparison.Illustration – Remember JHotDraw’s slice of DOM usage inFig. 2 in §II. This view was suitable for efficient explorationof project scopes that directly depend on DOM.Intelligence – The insight may help a developer to decide onthe feasibility of an API migration, as we discussed in §II.

D. The API Footprint Insight

Intent – Understand what API elements are used in a corpusor varying project scopes.Stakeholder – Project developer and API developer.API Usage – One API.View – The listing of used API packages, types, and methods.

Fig. 6. JDOM’s API Footprint in QUAATLAS (API-centric table).

Nontrivial JDOM API usage in velocityorg.apache.velocity.anakia.AnakiaJDOMFactory

Scope Tags incl. facets #proj

...Fig. 7. ‘Non-trivial API’ usage for package org.jdom in QUAATLAS.

Illustration – Remember the tree-based representation of theAPI footprint for JHotDraw as shown in Fig. 3 in §II. Ina similar manner, while using a table-based representation,Fig. 6 summarizes JDOM usage across QUAATLAS. AllJDOM packages are listed. The core package is heavily usedand thus the listing is further refined to show details per APItype. Ordering relies on the #ref metric. Clearly, there is littleusage of API elements outside the core package.Intelligence – Overall, the footprint describes the (smaller)‘actual’ API that needs to be understood as opposed to thefull (‘official’) API. For instance, many APIs enable nontrivial,framework-like usage [1], [28], but in the absence of actualframework-like usage, the project developer may entertaina much simpler view on the API. In the context of APIevolution, an API developer consults an API’s footprint tominimize changes that break actual usage or to make an impactanalysis for changes. In the context of wrapper-based APIre-implementation for API migration, an API developer or aproject developer (who develops a project-specific wrapper)uses the footprint to limit the effort [6], [7].

E. The Sub-API Footprint Insight

Intent – Understand usage of a sub-API in a corpus or project.Stakeholder – API developer and, possibly, project developer.API Usage – One API.

API migration by project developer:target effortAPI evolution by API developer:minimize breaking changes

The API footprint insight is dual to the API dispersion insight in the sense that it is gained through a slice of referenced API elements rather than through a table of referencing project elements. API developers might want to gain this insight for an entire corpus of projects to minimize the impact of breaking API changes. A project developer might want to gain this insight for a single project to decide whether a wrapper-based migration, where a wrapper of the new API has to be produced for each referenced element, is feasible.

Page 14: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Insight: API Coupling

intent

stakeholder

view

intelligence

understand what APIs or API domains are used in smaller project scopes

project developer

API-centric cloud, usage metrics applied

reveals potential code smell: too many APIs in small scope

on org.jhotdraw.app.AbstractView:

Basics!!Distribution!!GUI!!IO!!Component

java.lang!!java.net!!Swing!!JavaBeans!!java.io!!

APIs

API domains

Coupling in JHotDrawfor the interface org.jhotdraw.app.View

string manipulation

view saving

view painting

change notification

exceptions during saving

helps understand design and motivation for API dependencies

Shown here is an insight that is targeted more towards project developers who would like to understand what APIs are used together in a small project scope. This insight can be gained by configuring the tool to produce an API tag cloud for the currently selected project scope. The one on the slide is for the AbstractView class of JH, which seems to be referencing quite a lot of different APIs. For small project scopes, such as a method, this could be the sign of a code smell. For larger scopes, API tag clouds can also help understand the motivation behind API dependencies. Here for instance, java.lang is referenced for string manipulation, java.net for saving a view to a URI, Swing for painting views, JavaBeans for change notifications, and java.io for handling exceptions during the saving of a view.

Page 15: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Insight: API Profile

intentstakeholder

view

intelligence

understand what API facets are used in varying project scopesproject developer

API-centric cloud of API facets, usage metrics applied

project scope: reveals API asbestossmaller scope: API usage scenarios

Observation!!Input!!Exception!!Package de.nava.informa.parsers

Observation!!Input!!

Nontrivial XML!!Manipulation Exception!!Renaming

Addition Namespaces!!Nontrivial API!!Output!!

Project informa

JDOM’s API Profile for informa

e.g., JDOM’s profile in informa

The API profile insight is similar, but is gained through a cloud of the facets of a single API used within a project scope rather than complete APIs. At the top, we see the JDOM facets used within the entire informa project. Here, seldomly used non-trivial parts of an API reveal that the project might be difficult to change. At the bottom, we see the JDOM facets used within a smaller scope of the project. Here, the displayed facets correspond to API usage scenarios: the parsers package reads XML files and observes XML nodes.

Page 16: Multi-dimensional exploration of API usage - ICPC13 - 21-05-13

Conclusiondescribed several insights to be gained about API usage

http://softlang.uni-koblenz.de/explore-API-usage

provided Quaatlas API atlasre-engineered Qualitas projects for precise extraction of API usageadded meta-data concerning APIs, API domains, API facets

presented multi-dimensional exploration model

supported by IDE-like web-based platform Exapusconfigurable views on API usage

cocktail, dispersion, distribution, footprint, coupling, profile

future workempirical research on understanding API usage through explorationsupport flow analyses in views