code coverage and test suite effectiveness: empirical study with real bugs in large systems

23
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems Pavneet Singh Kochhar, Ferdian Thung, David Lo Singapore Management University {kochharps.2012,ferdiant.2013,davidlo}@smu.edu.s g International Conference on Software Analysis, Evolution, and Reengineering (SANER’15)

Upload: pavneet-singh-kochhar

Post on 09-Feb-2017

371 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in

Large Systems

Pavneet Singh Kochhar, Ferdian Thung, David Lo Singapore Management University

{kochharps.2012,ferdiant.2013,davidlo}@smu.edu.sg

International Conference on Software Analysis, Evolution, and Reengineering (SANER’15)

Page 2: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Software Testing, Why Bother?

2

Functionality -- Requirements

Bugs -- Software reliability

Costs -- Late bugs cost more

Page 3: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Software Testing, Why Bother?

• Horgan and Mathur [1]– Adequate testing is critical to develop reliable

software• Tassey [2]

– Inadequate testing cost US economy 59 billion dollars annually

3

[1] J.R. Horgan and A.P. Mathur, “Software testing and reliability.” McGraw-Hill, Inc., 1996.[2] G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” National Institute of Standards and Technology, 2002.

Page 4: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

• Gopinath et al. [1] – • Analyze hundreds of open-source projects to measure

the quality of test suites• Projects used are small i.e., 10 LOC to 10,000 LOC.

• Inozemtseva et al. [2] – • Analyze the relationship between test suite size,

coverage and effectiveness• Five large software systems

Both these studies use mutants i.e., artificially injected bugs

[1] Code coverage for suite evaluation by developersion, R. Gopinath, C. Jensen, and G. Alex, ICSE 2014[2] Coverage is not strongly correlated with test suite effectiveness, L. Inozemtseva and R. Holmes, ICSE 2014.

4

Previous Studies

Page 5: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Code Coverage

5

• Percentage of the code executed by test cases

• Used as a proxy for adequacy of testing• Types:

– Statement Coverage– Branch Coverage

• We measure coverage using Cobertura*

*http://cobertura.github.io/cobertura/

Page 6: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Study Goals

To understand the correlation between the test suite size, coverage and effectiveness.

6

Is code coverage effective in killing real bugs?

Page 7: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Outline

• Motivation and Goals• Overall Process• Dataset• Empirical Results• Conclusion and Future Work

7

Page 8: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Overall Process

8

Page 9: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Outline

• Motivation and Goals• Overall Process• Dataset• Empirical Results• Conclusion and Future Work

9

Page 10: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Dataset

10

Project Lines of Code Number of Bugs*

HTTPClient 122,288 67

Rhino 116,065 92

Project HTTPClient RhinoDescription Java library for

client side HTTP services

JavaScript Engine

Developed by Apache Mozilla

Build Tool Maven Ant

Issue Tracking JIRA Bugzilla

* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013

Page 11: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Test Suite Size & Coverage

11

Used Randoop tool to generate Junit tests for 5 mins

Project% of Original Test Suite Size

0.2 0.5 1 5 10 100

HTTPClient 7.43 15.62 39.13 197.82 396.17 3967.00

Rhino 7.64 16.01 40.10 202.52 405.46 4059.28

Project Coverage% of Original Test Suite Size

0.2 0.5 1 5 10 100

HTTPClient Line 7.5 11.0 17.2 28.0 31.8 37.4

Branch 2.8 4.4 7.6 14.4 17.2 22.5

Rhino Line 6.4 8.7 11.6 17.0 19.4 27.1

Branch 3.0 4.2 5.8 9.0 10.5 16.5

Page 12: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Test Suite Effectiveness

12

Test suite that runs successfully (i.e., all test cases run successfully) on a non-buggy version and fails on the buggy version (i.e., one of the test cases fails) kills the bug.

Page 13: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Point Biserial Correlation

13

• To measure the correlation between two variables when one of them is naturally dichotomous i.e., variable naturally takes value of 0 or 1.

• Pett et al. [1]Value Range Correlation

rpb2 ≥ 0.81 Very strong

0.49 ≤ rpb2 < 0.81 Strong

0.25 ≤ rpb2 < 0.49 Moderate

0.09 ≤ rpb2 < 0.25 Weak

0.00 ≤ rpb2 < 0.09 Very weak

[1] M. A. Pett. Nonparametric Statistics for Health Care Research: Statistics for Small Samples and Unusual Distributions. Sage Publications, Inc., 1997

Page 14: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Outline

• Motivation and Goals• Overall Process• Dataset• Empirical Results• Conclusion and Future Work

14

Page 15: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Research Questions

15

RQ1: Is there a correlation between a test suite’s size and its effectiveness? RQ2: Is there a correlation between a test suite’s coverage and its effectiveness?

Page 16: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Research Questions

16

RQ1:Size vs Effectiveness

Page 17: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

RQ1: Size vs Effectiveness

17

Test suite size is weakly to strongly correlated with test suite effectiveness.

Point Biserial Correlation

HTTPClient Rhino

rpb2 0.49 0.14

p-value * *

* Statistically Significant

Page 18: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Research Questions

18

RQ2:Coverage vs Effectiveness

Page 19: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

RQ2: Coverage vs Effectiveness

19

Code coverage of a test suite is moderately to strongly correlated to its effectiveness.

Point Biserial CorrelationStatement Branch

HTTPClient Rhino HTTPClient Rhino

rpb2 0.33 0.59 0.36 0.55

p-value * * * *

* Statistically Significant

Page 20: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Conclusion & Future WorkUsing real bugs, we find that• Test suite size is weakly to strongly correlated

with test suite effectiveness.• Code coverage is moderately or strongly

correlated to the effectiveness of a test suite.

Future Work:• Expand the study to include more projects

– Address threats to external validity• Use human generated test cases

20

Page 21: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

Thank you!

Questions? Comments? Advice?{kochharps.2012,ferdiant.2013}@[email protected]

Page 22: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

22

Threats to Validity

• Internal validity:– We link bug reports to commits using bug ids– We use Randoop for 5 minutes

• External validity:– Only analyze 2 large software systems

• Construct validity:– We use point biserial correlation

Page 23: Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems

23

Related Work• Empirical study on testing and coverage

– Gligoric et al. show that branch coverage is the best measure for test suite quality[1]

– Namin et al. show that test suite size and coverage is correlated with test suite effectiveness [2]

– Gopinath et al. investigate the correlation between coverage and a test suite’s effectiveness in killing mutants [3]

[1] M. Gligoric, A. Groce, C. Zhang, R. Sharma, M. A. Alipour, and D. Marinov. Comparing non-adequate test suites using coverage criteria, ISSTA, 2013.[2] A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness, ISSTA, 2009.[3] R Gopinath, C. Jensen, and A. Groce, Code coverage for suite evaluation for developers, ICSE, 2014.