4th international workshop on quantitative approaches …ceur-ws.org/vol-1771/paper4.pdfshayne flint...

7
Towards improved Adoption: Effectiveness of Research Tools in the Real World Richa Awasthy Australian National University Canberra, Australia Email: [email protected] Shayne Flint Research School of Computer Science Australian National University Canberra, Australia shayne.fl[email protected] Ramesh Sankaranarayana Research School of Computer Science Australian National University Canberra, Australia [email protected] Abstract—One of the challenges in the area of software engi- neering research has been the low rate of adoption by industry of the tools and methods produced by university researchers. We present a model to improve the situation by providing tangible evidence that demonstrates the real-world effectiveness of such tools and methods. A survey of practising software engineers indicates that the approach in the model is valid and applicable. We apply and test the model for providing such evidence and demonstrate its effectiveness in the context of static analysis using FindBugs. This model can be used to analyse the effectiveness of academic research contributions to industry and contribute towards improving their adoption. I. I NTRODUCTION The success of software engineering research in universi- ties can be measured in terms of the industrial adoption of methods and tools developed by researchers. Current adoption rates are low [1] and this contributes to a widening gap between software engineering research and practice. Consider, for example, code inspections, which according to Fagan’s law, are effective in reducing errors in software, increasing productivity and achieving project stability [2]. Static analysis tools developed by university researchers help automate the code inspection process. However, the use of such tools has not obtained widespread adoption in industry. One reason for this limited adoption is that researchers often fail to provide real-world evidence that the methods and tools they develop are of potential value to industry practitioners [1], [3]. One approach to providing such evidence is to conduct experiments that demonstrate the effectiveness of research tools in a real-world context. We apply this approach to analyse the effectiveness of a static analysis tool. In doing so, we demonstrate that such experimentation can contribute to closing the gap between research and practice. The structure of this paper is as follows: Section II provides the background of our work leading to the proposed model; Section III presents a survey which shows that real world evidence can positively influence the decision of software engineers to use research tools; Section IV explains our experimental method which uses FindBugs [4] to analyse real world bugs in Eclipse [5] code; Section V discusses how simple experiments like ours can encourage more developers to use tools developed by researchers and thus contribute to closing the gap between research and practice. Section VI provides an overview of related research; Section VII presents conclusions and discusses future research. II. BACKGROUND Since the 1970’s there have been ongoing efforts to increase the adoption of research outcomes outside of universities [6]. As a result of the United States Bayh Dole Act in 1980 [7], universities began to establish Technology Transfer Offices (TTOs) to facilitate the transfer of knowledge from universities to industry [8]. However, the effectiveness of TTOs has been questioned in recent years [9], [10] and there is a need to look beyond TTOs to improve adoption of academic research in industry. Researchers in universities are working towards addressing significant problems. The outcome of their work can be a tool or method which may or may not achieve wide-spread industry adoption. A key factor limiting the readiness of these research outcomes for adoption in industry is a lack of tangible evidence that they would be effective in practice [1]. This suggests that demonstrating the effectiveness of a tool in practice can contribute to improved adoption. Figure 1 depicts our model for demonstrating the real- world effectiveness of research tools and methods. The model involves 4 steps with intermediate activities. First step is to identify a problem to address. Step 2 is to develop a tool or a method to address the problem. The intermediate iterative activity involved between these 2 steps is the process of solution formulation, which involves adding new ideas to the available state of the art. These steps are followed by iterative testing for validation in Step 3. Step 3 confirms the readiness of the research outcome for adoption. An idea should be validated in a practical setting [11] to improve its adoption. Our model respects this viewpoint and emphasises the importance of tangible evidence from a practical setting in Step 4. Researchers should test their research outcomes in a scenario that involves real world users who are an important stakeholder for industry to increase the relevance of the evidence for industry. Demonstrating the effectiveness of research outcomes in real-world scenario will lead to change in industry perception and improved adoption of the research outcomes. 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) 20

Upload: vananh

Post on 13-Jul-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Towards improved Adoption: Effectiveness ofResearch Tools in the Real World

Richa AwasthyAustralian National University

Canberra, AustraliaEmail: [email protected]

Shayne FlintResearch School of Computer Science

Australian National UniversityCanberra, Australia

[email protected]

Ramesh SankaranarayanaResearch School of Computer Science

Australian National UniversityCanberra, Australia

[email protected]

Abstract—One of the challenges in the area of software engi-neering research has been the low rate of adoption by industryof the tools and methods produced by university researchers. Wepresent a model to improve the situation by providing tangibleevidence that demonstrates the real-world effectiveness of suchtools and methods. A survey of practising software engineersindicates that the approach in the model is valid and applicable.We apply and test the model for providing such evidence anddemonstrate its effectiveness in the context of static analysis usingFindBugs. This model can be used to analyse the effectivenessof academic research contributions to industry and contributetowards improving their adoption.

I. INTRODUCTION

The success of software engineering research in universi-ties can be measured in terms of the industrial adoption ofmethods and tools developed by researchers. Current adoptionrates are low [1] and this contributes to a widening gapbetween software engineering research and practice. Consider,for example, code inspections, which according to Fagan’slaw, are effective in reducing errors in software, increasingproductivity and achieving project stability [2]. Static analysistools developed by university researchers help automate thecode inspection process. However, the use of such tools hasnot obtained widespread adoption in industry. One reason forthis limited adoption is that researchers often fail to providereal-world evidence that the methods and tools they developare of potential value to industry practitioners [1], [3].

One approach to providing such evidence is to conductexperiments that demonstrate the effectiveness of researchtools in a real-world context. We apply this approach toanalyse the effectiveness of a static analysis tool. In doingso, we demonstrate that such experimentation can contributeto closing the gap between research and practice.

The structure of this paper is as follows: Section II providesthe background of our work leading to the proposed model;Section III presents a survey which shows that real worldevidence can positively influence the decision of softwareengineers to use research tools; Section IV explains ourexperimental method which uses FindBugs [4] to analyse realworld bugs in Eclipse [5] code; Section V discusses howsimple experiments like ours can encourage more developersto use tools developed by researchers and thus contribute toclosing the gap between research and practice. Section VI

provides an overview of related research; Section VII presentsconclusions and discusses future research.

II. BACKGROUND

Since the 1970’s there have been ongoing efforts to increasethe adoption of research outcomes outside of universities [6].As a result of the United States Bayh Dole Act in 1980[7], universities began to establish Technology Transfer Offices(TTOs) to facilitate the transfer of knowledge from universitiesto industry [8]. However, the effectiveness of TTOs has beenquestioned in recent years [9], [10] and there is a need tolook beyond TTOs to improve adoption of academic researchin industry.

Researchers in universities are working towards addressingsignificant problems. The outcome of their work can be atool or method which may or may not achieve wide-spreadindustry adoption. A key factor limiting the readiness of theseresearch outcomes for adoption in industry is a lack of tangibleevidence that they would be effective in practice [1]. Thissuggests that demonstrating the effectiveness of a tool inpractice can contribute to improved adoption.

Figure 1 depicts our model for demonstrating the real-world effectiveness of research tools and methods. The modelinvolves 4 steps with intermediate activities. First step isto identify a problem to address. Step 2 is to develop atool or a method to address the problem. The intermediateiterative activity involved between these 2 steps is the processof solution formulation, which involves adding new ideas tothe available state of the art. These steps are followed byiterative testing for validation in Step 3. Step 3 confirmsthe readiness of the research outcome for adoption. An ideashould be validated in a practical setting [11] to improve itsadoption. Our model respects this viewpoint and emphasisesthe importance of tangible evidence from a practical settingin Step 4. Researchers should test their research outcomesin a scenario that involves real world users who are animportant stakeholder for industry to increase the relevance ofthe evidence for industry. Demonstrating the effectiveness ofresearch outcomes in real-world scenario will lead to changein industry perception and improved adoption of the researchoutcomes.

4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)

20

We test the applicability of this model in the static analysiscontext by identifying a static analysis tool created by univer-sity researchers and analysing its effectiveness in a real-worldscenario.

Problem Develop a

Tool/method Test in the

lab

Test in the real

environment

Improved Industrial Adoption

of the tool

Solution Formulation

Ready for testing

Effectiveness leading to

University Research

Modify

Ready for adoption

1 2 3

4

Fig. 1. Proposed model for improving adoption of university research byindustry

III. THE IMPACT OF EVIDENCE FROM REAL-WORLDSCENARIO

In order to understand the impact of evidence from a user-scenario on real-world decisions to use a research tool, weconducted an on-line survey of software developers.

The survey uses static analysis as an example and wasprepared and delivered using our university’s online pollingsystem [12]. Participants were invited by email which includeda link to the on-line survey and a participant informationstatement. On completion of the survey, we manually analysedthe results.

III.B.1

III.B.2

III.B.3

Fig. 2. Flowchart for the survey questionnaire design

A. Participants

We invited 20 software industry practitioners and around10 computer science researchers with industry experience insoftware development.

B. Survey Questions

The survey data consisted of responses to the sequence ofquestions depicted in Figure 2 and described below.

1) Software engineering experience: We gathered informa-tion about the level of software development expertise of eachparticipant so that we can understand any relationship betweenexperience and use of static analysis tools.

2) Static analysis knowledge: We asked the following ques-tions to determine each participant’s level of understanding anduse of static analysis tools.

a) ‘Static analysis is the analysis of computer softwareto find potential bugs without actually executing thesoftware. Have you heard of static analysis before?’

b) ‘Have you used any automated static analysis toolsduring software development (e.g. FindBugs, Coverity)?’

Answers to these questions were used to determine the finalquestion we asked, as indicated in Figure 2.

3) The impact of the tangible evidence: At the end of thesurvey each participant who has not used static analysis toolswas asked a question to determine the impact that tangibleevidence (that the tool can identify real bugs early in thesoftware development life-cycle) might have on their approachto static analysis. The exact question asked depended on theiranswers to questions described in Section III-B2. Specifically:

1) Participants who had no knowledge of static analysis wereasked ‘Would our research results interest you in gainingknowledge of static analysis and adoption of automatedstatic analysis tools?’.

2) Participants who knew about static analysis but had notused any static analysis tools (our primary group ofinterest) were asked to rate the impact that the followingfactors would have on their decision to adopt staticanalysis tools. A Likert scale was used with 5 options (Noinfluence, May be, Likely, Highly Likely, and Definitely).

a) Effectiveness of tool in finding bugsb) Ease of usec) Integration of tool to development environment.d) License type.e) The availability of tangible evidence that the tool can

identify real bugs early in the software developmentlife-cycle - before they are reported by users.

C. Analysis of Survey Results

The response rate for our survey was high with 27 responsesout of 30 invitations. Responses to the survey indicate thattangible evidence of real-world effectiveness of a tool haspositive impact on decisions to adopt static analysis tools.

Analysis of the survey results provides the following spe-cific findings:

4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)

21

1) Software Engineering Experience - As expected, partici-pants had varied level of experience in software devel-opment. However, we do not find any direct relationbetween the experience and usage of tools.

2) Static Analysis Knowledge - Survey results show that9 participants (33%) had no prior knowledge of staticanalysis. It is noteworthy that while the remaining 18participants knew about static analysis, only 4 of themhad used static analysis tools.

3) Impact of the tangible evidence - Our survey results showthat tangible evidence has a positive impact on decisionsto adopt static analysis tools. Out of the 9 respondentswho had no prior knowledge of static analysis, 8 said thattangible evidence would interest them in gaining knowl-edge of static analysis and adopting automated staticanalysis tools. This is a valuable information indicatingthat providing evidence of effectiveness could contributeto improved adoption of research tools in industry.Of the 14 participants who had knowledge of staticanalysis but who had not used any tools, 7 participants(50%) indicated that tangible evidence would be HighlyLikely or would Definitely influence their decision toadopt static analysis tools (Figure 3). Another four par-ticipants indicated that such evidence would be Likelyto influence their decision. Considering the response ofthree participants as May be, it is possible that theyrespond positively, which will add to the percentage ofparticipants agreeing that tangible evidence will influencethe decision.

None May be Likely Highly likely Definitely0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Positive influence on decisions to adopt static analysis tool

Nu

mb

er

of P

art

icip

an

ts

Fig. 3. The impact of tangible evidence on decisions to adopt static analysistools

As shown in Figure 4, our results also indicate that otherfactors such as Ease of use, IDE integration and Licensetype have a positive impact on decisions to adopt staticanalysis tools. It is interesting to note that under the Maybe and Definitely category, the top two influencing factorsare License and Tangible evidence.

Our results clearly show that tangible evidence is an impor-tant factor in influencing decisions to adopt research tools inindustry.

None May be Likely Highly likely Definitely0

1

2

3

4

5

6

7

Effectiveness of tool

Ease of use

Integration to IDE

License

Tangible evidence

Influence of the factor

Nu

mb

er

of p

art

icip

an

ts

Fig. 4. Other factors influencing decisions to adopt static analysis tools

IV. APPLICABILITY OF THE MODEL

In order to test the applicability of our proposed model,we first had to identify an appropriate tool developed byresearchers and a scenario to test its effectiveness. FindBugsversion 3.0 was chosen for our research as according to thetool’s website, there are few organizations using FindBugs[4]. This indicates low adoption of the tool in softwareindustry. Also, it is an open-source static analysis tool witha university’s trademark. We conducted an experiment withthe FindBugs static analysis tool to analyse its effectivenessin the real world. To analyse the effectiveness in real-world,we wanted to determine if FindBugs is capable of findingbugs reported by real users of Eclipse. To do so, we adoptedan approach to establishing a connection between warningsgenerated by the FindBugs static analysis tool and field bugsreported by Eclipse users on Bugzilla [13] that includes thebelow steps:

1) Use FindBugs to identify potential bugs in Eclipse classfiles.

2) Search the Eclipse bug-tracking system Bugzilla to iden-tify bug reports that include stack traces.

3) Match the code pointed in Java classes associated withFindBugs warnings identified in 1) with code pointed bystack trace associated with the bugs identified in 2).

A. FindBugs

FindBugs analyses Java class files to report warnings andpotential errors in the associated source code. The tool per-forms analysis on Java class files without needing access tothe source code. It is stable, easy to use, and as mentionedin [14] has higher precision compared to other tools such asCodePro Analytix, PMD and UCDetector. It has a low rateof false positives [15], and has more than 400 types of bugclassification along with categorization based on severity level.The analysis is based on bug patterns which are classifiedinto nine categories: Bad Practice, Correctness, Experimen-tal, Internationalization, Malicious Code Vulnerability, Multi-threaded Correctness, Performance, Security, and Dodgy Code.The warnings reported by FindBugs can be further categorisedwithin the tool as: Scariest (Ranks 1-4), Scary (Ranks 1-9) andTroubling (Ranks 1-14). The category includes all the bugs

4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)

22

with the ranking mentioned, for example, the ‘Scary’ categorywill list the bugs included under the ‘Scariest’ category, aswell.

B. Eclipse

We identified Eclipse as the object of analysis as it is a largewidely used open source project with good records of userreported bugs over many years and versions of the software.For our experimentation we focused on the analysis of Eclipseversion 4.4 (Luna) because it was the current version at thetime of our experimentation.

C. Identification of potential bugs

Java Jars associated with Eclipse versions 4.4 were analysedusing FindBugs version 3.0. Findbugs generated a list ofwarnings pertaining to code considered as faulty.

D. Search for user reported bugs

The Eclipse project uses Bugzilla to track bugs reportedby users. In order to identify bugs that could be associatedwith FindBugs warnings, we needed to identify bug reportsthat included a documented stack-trace. This was achieved byperforming an advanced Bugzilla search for bugs that satisfiedthe following criteria:

• Version: 4.4,• Classification: Eclipse,• Status: NEW, ASSIGNED, VERIFIED 1.

We then inspected the query results to identify those bugreports that included a documented stack-trace.

E. Match FindBugs warnings with user reported Bugs

Our last step was to match warnings generated by FindBugs(Section IV-C) with user-reported bugs (Section IV-D). Thiswas achieved using the following steps:

1) For each of the bugs identified using the proceduredescribed in Section IV-D, we identified the Java classthat was the likely source of the reported bug. This classwas usually the one appearing at the top of the stack-trace. In some cases, we had to traverse through lowerlevels in the stack-trace to find matching classes.

2) We then searched for the above classes in the warningsgenerated by FindBugs (Section IV-C) and analysed thecode associated with the warning. We did this by usingthe FindBugs class name filter feature to show warningsrelated to the class of interest.

3) Finding a matching line of code in the FindBugs warningsestablishes a connection between the warnings generatedby FindBugs and the bugs reported by users.

1Because our experiment looks at the ability of FindBugs to find bugs thathave not been fixed, we ignore the bugs with CLOSED status. In addition,we do not consider the possibility of bugs that have been closed incorrectly.

F. Results of the Experiment

1) Analysis of FindBugs warnings for Eclipse version 4.4:Our analysis of FindBugs warnings for Eclipse version 4.4showed that static analysis of Eclipse version 4.4 generatedwarnings in the categories of Correctness (652), Bad Prac-tice (547), Experimental (1), Multithreaded correctness (390),Security (3), and Dodgy code (55) under the rank range ofTroubling (Rank 1-14), as depicted in Figure 5. We focusedon the Scariest warnings (Rank 1-4), considering them as realproblems requiring attention. There were 82 Scariest warningsin total. These comprised 81 warnings in the Correctnesscategory and one in the Multi-threaded correctness category.

Additional investigation found that warnings in the Cor-rectness category included a range of coding errors such ascomparisons of incompatible types, null pointer dereferencesand reading of uninitialized variables.

Correctness

Bad Practice

Experimental

Multithreaded correctness

Security

Dodgy code

0 100 200 300 400 500 600 700

Number of warnings

Ca

teg

ory

of w

arn

ing

s

Fig. 5. Number of warnings in each category in Eclipse 4.4

2) Connection between FindBugs warnings and user re-ported bugs: Execution of the query described in SectionIV-D resulted in a dataset of 2575 bugs, which included347 enhancements. We excluded the enhancements from ouranalysis. Out of the remaining bugs, we have analysed 1185bug reports so far, 90 of which included a documented stack-trace.

We used the method described in Section IV-E to comparethe stack-trace in these 90 bug reports with the warningsgenerated by FindBugs (Section IV-F1). We found that sixof the user reported bugs could be associated with FindBugswarnings as presented in Table I. The data presented in thetable includes the Bugzilla Bug ID, a description of the bug,and a description of the warning generated by FindBugs.

V. DISCUSSION

The model proposed in this paper considers that in orderto improve the adoption of university research outcomes, re-searchers need to demonstrate its effectiveness in a real-worldscenario and think about the value of their research outcomesbeyond the lab boundaries. Our main purpose in conductingthe experiment described was to test the applicability of ourproposed model by analysing the value of a research tool inindustrial practice. Specifically, we evaluated the performanceof the FindBugs static analysis tool by analysing its capability

4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)

23

TABLE IUSER-FILED BUGS IN ECLIPSE WITH ASSOCIATED WARNING IN FINDBUGS

Bug Id Problem Description FindBugs Warning436138 NPE when select SWT.MOZILLA style in

control exampleMethod call passes null for nonnull parameterThis method call passes a null value for anonnull method parameter. Either the parameter is annotated as a parameter that shouldalways be nonnull, or analysis has shown that it will always be dereferenced. Bug kindand pattern: NP - NP NULL PARAM DEREF

414508 NPE trying to launch Eclipse App Load of known null value. The variable referenced at this point is known to be null dueto an earlier check against null. Although this is valid, it might be a mistake (perhapsyou intended to refer to a different variable, or perhaps the earlier check to see if thevariable is null should have been a check to see if it was nonnull). Bug kind and pattern:NP - NP LOAD OF KNOWN NULL VALUE

433526 browser.getText() is throwing exception af-ter Internet Explorer 11 install

Possible null pointer dereference There is a branch of statement that, if executed, guaran-tees that a null value will be dereferenced, which would generate a NullPointerExceptionwhen the code is executed. Of course, the problem might be that the branch or statementis infeasible and that the null pointer exception can’t ever be executed; deciding that isbeyond the ability of FindBugs.Bug kind and pattern: NP - NP NULL ON SOME PATH

459025 Can’t right-click items in manifest editor’sextensions tab on OSX

Non-virtual method call passes null for nonnull parameter A possibly-null value ispassed to a nonnull method parameter. Either the parameter is annotated as a parameterthat should always be nonnull, or analysis has shown that it will always be dereferenced.Bug kind and pattern: NP - NP NULL PARAM DEREF NONVIRTUAL

427421 NumberFormatException in periodicWorkspace Save Job

Boxing/unboxing to parse a primitive A boxed primitive is created from a String, just toextract the unboxed primitive value. It is more efficient to just call the static parseXXXmethod.Bug kind and pattern: Bx - DM BOXED PRIMITIVE FOR PARSING

428890 Search view only shows default page (NPEin PageBookView.showPageRec)

Possible null pointer dereference There is a branch of statement that, if executed, guaran-tees that a null value will be dereferenced, which would generate a NullPointerExceptionwhen the code is executed. Of course, the problem might be that the branch or statementis infeasible and that the null pointer exception can’t ever be executed; deciding that isbeyond the ability of FindBugs.Bug kind and pattern: NP - NP NULL ON SOME PATH

426485 [EditorMgmt][Split editor] Each split causeseditors to be leaked

Possible null pointer dereference There is a branch of statement that, if executed, guaran-tees that a null value will be dereferenced, which would generate a NullPointerExceptionwhen the code is executed. Of course, the problem might be that the branch or statementis infeasible and that the null pointer exception can’t ever be executed; deciding that isbeyond the ability of FindBugs.Bug kind and pattern: NP - NP NULL ON SOME PATH

in generating warnings relating to real-world bugs reported byusers of the Eclipse IDE.

Our results indicate that FindBugs is capable of identifyingbugs that will manifest themselves as bugs reported by users.Since real-world evidence would influence the decision toadopt a tool as indicated in the survey results, FindBugsneeds to improve the percentage of such warnings to makethe evidence convincing. Currently, FindBugs does not havethe intelligence to track the bug among the list of false pos-itives that can manifest. Our research provides improvementdirections to FindBugs in identifying the important bugs fromthe warning base by analysing the historical data and user re-quirement. FindBugs results can be improved by introducing anew ‘user-impact’ category to classify the warnings which willpotentially have an impact in the client environment and hence,need immediate attention. For this, sufficient information fromindustrial practice needs to be applied into testing the researchtool.

The model appears simple but it is challenging for re-searchers to identify a scenario and approach users and/or datato demonstrate the effectiveness of their tool or methods. Itmight be difficult for them to find the user-filed data always.Also, once they have the data to demonstrate the effectiveness,they need a mechanism to propagate it to industry. Also,the concern about the relevance of research suggests that re-

searchers need to think about the relevance of the problem theyare trying to address. These factors indicate that universitiesand industry need to start collaborating at an early stage andconsider co-developing whenever possible and feasible. Thiscan pave a good start towards improving the relevance andadoption of research outcomes in general, and bridging thegap between industry and academia.

A. Limitations and Threats to Validity

Our experiments were limited by the small number ofEclipse bugs reported with a documented stack-trace. It isimportant to note that the ability to analyse only 90 of the1185 bugs considered reflects a limitation of our approachand does not reflect a limitation of FindBugs. There are somelimitations to the validity of our experiments:

1) FindBugs does not always point to the exact line numberreferred to in the stack trace. It might be possible thatthe source of error could be different from the warningsprovided by FindBugs.

2) While it is likely that the FindBugs warnings listed inTable I are the actual cause of the listed real-world bugsreported by Eclipse users, we cannot be certain of this.

3) As there is lack of literature detailing the use of staticanalysis tools like FindBugs by the Eclipse developmentteam, a comparison study was not feasible.

4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)

24

4) This test highlights the relevance of static analysis tools,though the effectiveness in real-world projects, particu-larly large scale projects, cannot be confirmed as the sam-ple size of our survey is small. However, by consideringthe sample sizes of 20 and 18 used in related studies [16],[17], we decided to proceed with our sample size of 27participants.

5) The conclusion might not be generalised as the resultsare specific to the static analysis context. The proposedmodel needs to be validated regarding the tools in otherphases of software development.

VI. RELATED WORK

Adoption of software engineering research outcomes inindustry practice has been a concern [1], [18]. There have beenongoing efforts to improve the adoption of research outcomessince the 1970’s. However, the efforts mainly focused onapproaches to increase university-industry collaboration forimproving adoption through technology transfer. These effortsinclude policy changes leading to establishment of TTOs inuniversities and proposing models for effective technologytransfer [8], [19]. However, TTOs generally focus on buildingcollaborative relationships between researchers and industry[20] rather than the readiness of research outcomes for adop-tion in industry. In this paper we demonstrated how somesimple experiments can analyse the effectiveness of softwareengineering research tools in practice. Specifically we analysedthe effectiveness of a static analysis tool.

Various experiments have been conducted to demonstratethe effectiveness of the FindBugs static analysis tool by show-ing that it is able to detect the specific problems it has beendesigned to detect [15], [21], [22]. However, our experimenthas been conducted in unconstrained environment that involvesreal-world scenario that has impacted clients. Ruthruff et al.[23] involved developers in determining which reports wereuseful and which ones were not. This information was usedto filter FindBugs warnings so that only those that developersfound useful were reported. We retrace the user-filed bug tothe warning generated by the static analysis tool. This canpave way to create an intelligent mechanism to prioritise bugsbased on user-impact.

Al Bessey et al. [24] identify several factors that impactedthe industry adoption of their static analysis tool Coverity [25].They include trust between the researchers and industry users,the number of false positives, and the capability of the toolto provide warnings relating to problems which have had asignificant impact on its users. Our work also confirms thattool’s capability is important. It also identifies that licensing,IDE integration and ease of use are significant factors.

Johnson et al. [16] investigated the reasons behind thelow adoption rate of static-analysis tools despite their provenbenefits in finding bugs. Their investigations confirmed thatlarge numbers of false positives is a major factor in lowadoption rates. We note that their findings were based on thesurvey of developers who had all used static analysis tools.This meant that authors were not able to comment on whether

low-adoption rates were due to a lack of awareness of staticanalysis tools among developers.

None of the work described above analyses the effectivenessof FindBugs in identifying problems that manifest themselvesas real-world bugs reported by users. The experiments de-scribed in this paper analyse the connection between warningsgenerated by the FindBugs static analysis tool and field defectsreported by Eclipse users on Bugzilla bringing in client intothe perspective. The experiments test the applicability of ourproposed model in the static analysis context.

VII. CONCLUSION AND FUTURE WORK

We have proposed a model to contribute towards improvingthe adoption of research tools by industry by demonstratingthe effectiveness of the tool in real world scenario. We havepresented a mechanism which involves a research tool as amedium of building tangible evidence. A survey of softwaredevelopers supports our hypothesis that such tangible evidenceof effectiveness of a tool can have a positive influence on real-world decisions to adopt static analysis tools.

Further experiment for testing the applicability of the modelin the static analysis context was conducted. In this experi-ment, by establishing a connection between user-reported bugsand warnings generated by the FindBugs static analysis tool,we have demonstrated the ability of static analysis tools toeliminate some defects before software is deployed. However,the evidence needs to be stronger regarding the number of suchconnections in order to be more convincing and improving theindustrial adoption of the tool.

Future research would present more detailed analysis ofthe complete list of the bugs found in Section IV-F2, whichwill provide us precise data about the effectiveness of thetool according to our approach. Our approach also presentsa scenario where industry and university researchers can worktogether to create more useful tools. We plan to discuss theseresults with the FindBugs development team to explore thepossibility of strengthening the evidence and devising a newclassification user-impact to indicate the warnings that wouldmanifest in client-environment.

Finally, we would like to adapt this approach to explore theeffectiveness of research tools involved in other phases of thesoftware development life-cycle.

REFERENCES

[1] D. Rombach and F. Seelisch, “Balancing agility and formalism insoftware engineering,” B. Meyer, J. R. Nawrocki, and B. Walter, Eds.Berlin, Heidelberg: Springer-Verlag, 2008, ch. Formalisms in SoftwareEngineering: Myths Versus Empirical Facts, pp. 13–25.

[2] A. Endres and H. D. Rombach, A handbook of software and systemsengineering: empirical observations, laws and theories. PearsonEducation, 2003.

[3] M. Ivarsson and T. Gorschek, “A method for evaluating rigor andindustrial relevance of technology evaluations,” Empirical SoftwareEngineering, vol. 16, no. 3, pp. 365–395, 2011.

[4] University of Maryland, “Findbugs,” viewed May 2015,http://findbugs.sourceforge.net, 2015.

[5] The Eclipse Foundation, “Eclipse,” viewed May 2015,http://www.eclipse.org, 2015.

4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)

25

[6] R. Grimaldi, M. Kenney, D. S. Siegel, and M. Wright, “30 years afterbayh–dole: Reassessing academic entrepreneurship,” Research Policy,vol. 40, no. 8, pp. 1045–1057, 2011.

[7] W. H. Schacht, “Patent ownership and federal research and development(R&D): A discussion on the Bayh-Dole act and the Stevenson-Wydleract.” Congressional Research Service, Library of Congress, 2000.

[8] D. S. Siegel, D. A. Waldman, L. E. Atwater, and A. N. Link, “Com-mercial knowledge transfers from universities to firms: improving theeffectiveness of university–industry collaboration,” The Journal of HighTechnology Management Research, vol. 14, no. 1, pp. 111–133, 2003.

[9] J. G. Thursby, R. Jensen, and M. C. Thursby, “Objectives, characteristicsand outcomes of university licensing: A survey of major us universities,”The Journal of Technology Transfer, vol. 26, no. 1-2, pp. 59–72, 2001.

[10] D. S. Siegel, D. A. Waldman, L. E. Atwater, and A. N. Link, “Towarda model of the effective transfer of scientific knowledge from academi-cians to practitioners: qualitative evidence from the commercializationof university technologies,” Journal of Engineering and TechnologyManagement, vol. 21, no. 1, pp. 115–142, 2004.

[11] R. L. Glass, “The relationship between theory and practice in softwareengineering,” Communications of the ACM, vol. 39, no. 11, pp. 11–13,1996.

[12] The Australian National University, “Anu polling online,” viewed July2015, ¡https://anubis.anu.edu.au/apollo/¿, 2015.

[13] Creative Commons License, “bugzilla,” viewed May 2015,https://www.bugzilla.org, 2015.

[14] A. K. Tripathi and A. Gupta, “A controlled experiment to evaluate theeffectiveness and the efficiency of four static program analysis tools forjava programs,” in Proceedings of the 18th International Conferenceon Evaluation and Assessment in Software Engineering. ACM, 2014,p. 23.

[15] D. Hovemeyer and W. Pugh, “Finding bugs is easy,” ACM SigplanNotices, vol. 39, no. 12, pp. 92–106, 2004.

[16] B. Johnson, Y. Song, E. Murphy-Hill, and R. Bowdidge, “Why don’tsoftware developers use static analysis tools to find bugs?” in SoftwareEngineering (ICSE), 2013 35th International Conference on. IEEE,2013, pp. 672–681.

[17] L. Layman, L. Williams, and R. S. Amant, “Toward reducing fault fixtime: Understanding developer behavior for the design of automatedfault detection tools,” in Empirical Software Engineering and Measure-ment, 2007. ESEM 2007. First International Symposium on. IEEE,2007, pp. 176–185.

[18] S. Beecham, P. OLeary, I. Richardson, S. Baker, and J. Noll, “Who arewe doing global software engineering research for?” in 2013 IEEE 8thInternational Conference on Global Software Engineering. IEEE, 2013,pp. 41–50.

[19] S. L. Pfleeger, “Understanding and improving technology transfer insoftware engineering,” Journal of Systems and Software, vol. 47, no. 2,pp. 111–124, 1999.

[20] D. S. Siegel, R. Veugelers, and M. Wright, “Technology transfer officesand commercialization of university intellectual property: performanceand policy implications,” Oxford Review of Economic Policy, vol. 23,no. 4, pp. 640–660, 2007.

[21] N. Ayewah, W. Pugh, J. D. Morgenthaler, J. Penix, and Y. Zhou,“Using findbugs on production software,” in Companion to the 22ndACM SIGPLAN conference on Object-oriented programming systemsand applications companion. ACM, 2007, pp. 805–806.

[22] ——, “Evaluating static analysis defect warnings on production soft-ware,” in Proceedings of the 7th ACM SIGPLAN-SIGSOFT workshopon Program analysis for software tools and engineering. ACM, 2007,pp. 1–8.

[23] J. R. Ruthruff, J. Penix, J. D. Morgenthaler, S. Elbaum, and G. Rother-mel, “Predicting accurate and actionable static analysis warnings: anexperimental approach,” in Proceedings of the 30th international con-ference on Software engineering. ACM, 2008, pp. 341–350.

[24] A. Bessey, K. Block, B. Chelf, A. Chou, B. Fulton, S. Hallem, C. Henri-Gros, A. Kamsky, S. McPeak, and D. Engler, “A few billion linesof code later: using static analysis to find bugs in the real world,”Communications of the ACM, vol. 53, no. 2, pp. 66–75, 2010.

[25] Synposys Inc., “Coverity,” viewed May 2015, http://www.coverity.com,2015.

4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016)

26