potential biases in bug localization: do they matter?

33
Potential Biases in Bug Localization: Do They Matter? Pavneet Singh Kochhar, Yuan Tian, David Lo Singapore Management University {kochharps.2012, yuan.tian.2012,davidlo}@smu.edu.sg

Upload: pavneet-singh-kochhar

Post on 09-Feb-2017

151 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Potential Biases in Bug Localization: Do They Matter?

Potential Biases in Bug Localization: Do They Matter?

Pavneet Singh Kochhar, Yuan Tian, David LoSingapore Management University

{kochharps.2012, yuan.tian.2012,davidlo}@smu.edu.sg

Page 2: Potential Biases in Bug Localization: Do They Matter?

Issue Tracking

• Projects use issue tracking systems like JIRA

• Well-known projects receive large number of issue reports

• Large number of bug reports can overwhelm the number of developers.

• Mozilla developer - “Everyday, almost 300 bugs appear that need triaging. This is far too much for only the Mozilla programmers to handle” *

What have researchers proposed to overcome this issue?

* J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug repository,” in ETX, pp. 35–39, 2005

2/25

Page 3: Potential Biases in Bug Localization: Do They Matter?

Bug Localization

Thousands of Source Code Files

GOAL: Find the buggy files ------>

3/25

Page 4: Potential Biases in Bug Localization: Do They Matter?

How Bug Localization Works

• Uses fixed/closed bug reports

• Uses standard information retrieval (IR) techniques such as Vector space model (VSM)

• Computes similarity between bug reports & source code

• Returns rank list of potential buggy source code files

• Returned list is compared with actual buggy files to compute accuracy

4/25

Page 5: Potential Biases in Bug Localization: Do They Matter?

Issues in Bug Localization

HOWEVER

What if bug localization results are biased?

• Past study* shows: • Upto 80% of the bug reports can be localized by

inspecting 5 source code files.• Results are promising

* Improving bug localization using structured information retrieval, R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry, ASE 2013

5/25

Page 6: Potential Biases in Bug Localization: Do They Matter?

Our Study

Potential Biases in Bug Localization

1. Wrongly Classified Reports Herzig et al. *– 1/3 of reports marked as bugs are not bugs 2. Already Localized Reports

3. Incorrect Ground Truth Files Kawrykow et al.+ - Lot of changes are non-essential

* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013+ Non-essential changes in version histories D. Kawrykow and M. P. Robillard.. ICSE, 2011.

6/25

Page 7: Potential Biases in Bug Localization: Do They Matter?

Our Study

Potential Biases in Bug Localization

1. Wrongly Classified Reports Herzig et al. *– 1/3 of reports marked as bugs are not bugs 2. Already Localized Reports

3. Incorrect Ground Truth Files Kawrykow et al.+ - Lot of changes are non-essential

* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013+ Non-essential changes in version histories D. Kawrykow and M. P. Robillard.. ICSE, 2011.

7/25

Page 8: Potential Biases in Bug Localization: Do They Matter?

Dataset

Projects Organization Tracker Number of Issue Reports

HTTPClient Apache JIRA 746

Jackrabbit Apache JIRA 2402

Lucene-Java Apache JIRA 2443

Total = 5591 Issue Reports *

* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013

8/25

Page 9: Potential Biases in Bug Localization: Do They Matter?

Evaluation Metric

Average precision

Mean Average Precision (MAP) – Mean of average precisions over all ranked lists.

9/25

Page 10: Potential Biases in Bug Localization: Do They Matter?

BIAS 1– Report Misclassification

Projects Reported Actual Difference Cohen’s dHTTPClient 0.429 0.419 -2.33% 0.13

Jackrabbit 0.302 0.339 12.25%* 0.06

Lucene-Java

0.301 0.322 6.98% 0.04

Difference of -2.33% to 12.25% between MAP scores* Statistical significant differences (Mann-Whitney Wilcoxon test)Effect sizes are trivial (d<0.2)

Mean Average Precision (MAP) Scores

10/25

Page 11: Potential Biases in Bug Localization: Do They Matter?

BIAS 1– Report Misclassification

Mean Average Precision (MAP) ScoresActual to Reported HC JB LJ Overall

None 0.429 0.302 0.301 0.312RFE to BUG 0.427 0.303 0.304 0.313DOCUMENTATION to BUG 0.430 0.304 0.305 0.315IMPROVEMENT to BUG 0.416 0.299 0.295 0.307

REFACTORING to BUG 0.428 0.301 0.301 0.311BACKPORT to BUG 0.430 0.303 0.300 0.313CLEANUP to BUG 0.429 0.303 0.303 0.314

SPEC to BUG 0.435 0.302 0.301 0.312TASK to BUG 0.432 0.302 0.301 0.312TEST to BUG 0.429 0.328 0.313 0.334BUILD_SYSTEM to BUG 0.429 0.306 0.303 0.315

DESIGN_DEFECT to BUG 0.424 0.301 0.301 0.311OTHERS to BUG 0.439 0.303 0.301 0.313

* HC – HTTPClient, JB- Jackrabbit, LJ – Lucene-Java

11/25

Page 12: Potential Biases in Bug Localization: Do They Matter?

BIAS 1– Report Misclassification

Results:Significantly impacts bug localization result for 1/3 projectsHowever, effect sizes are negligible i.e., <0.2

12/25

Page 13: Potential Biases in Bug Localization: Do They Matter?

BIAS 2– Localized Bug Reports

Category DescriptionFully All the buggy files are mentioned in the bug report

Partially Some of the buggy files are specified in the bug report

Not Bug reports do not specify any buggy files

Fully Localized Report (Example)Category DescriptionSummary DecompressingEntity not calling close on InputStream

retrieved by getContent

Description The method DecompressingEntity.writeTo(OutputStream outstream) does not close the InputStream retrieved by getContent().

Buggy Files DecompressingEntity.java

Categories

13/25

Page 14: Potential Biases in Bug Localization: Do They Matter?

BIAS 2– Localized Bug Reports

Manually Identifying Localized Reports

5591 Issue reports

1191 bug reports (Herzig et al.*)

Randomly selected 350

Files changed Summary & Description

Classified bug reports

* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013

14/25

Page 15: Potential Biases in Bug Localization: Do They Matter?

BIAS 2– Localized Bug Reports

Based on manual investigation:Build an algorithm to automatically classify bug reportsInput – Summary/Description of bug reports &

Files changed to fix the bugOutput – Bug reports classified into 1 out of 3

categories

Automatically Identifying Localized Reports

15/25

Page 16: Potential Biases in Bug Localization: Do They Matter?

BIAS 2– Localized Bug Reports

Number/ProportionProject Category Number Proportion

Fully 36 3.02%

HTTPClient Partially 28 2.35%

Not 35 2.93%

Fully 299 25.10%

Jackrabbit Partially 132 11.08%

Not 402 33.75%

Fully 63 5.28%

Lucene-Java Partially 87 7.30%

Not 109 9.15%

Overall 33.41% are fully localizedMore than 50% fully or partially localized

16/25

Page 17: Potential Biases in Bug Localization: Do They Matter?

BIAS 2– Localized Bug Reports

Projects Fully Partially NotHTTPClient 0.615 0.349 0.250

Jackrabbit 0.560 0.373 0.187

Lucene-Java 0.527 0.338 0.197

Difference between Fully & Not HTTPClient - 84.39% Jackrabbit - 99.86% Lucene-Java - 91.16%

Mean Average Precision (MAP) Scores

17/25

Page 18: Potential Biases in Bug Localization: Do They Matter?

BIAS 2– Localized Bug Reports

ProjectsFully-Partially Partially-Not Fully-Not

p-value

d Effect Size

p-value

d Effect Size

p-value

d Effect Size

HTTPClient * 0.94 L * 0.53 M * 1.27 L

Jackrabbit * 0.56 M * 0.55 M * 1.14 L

Lucene-Java

* 0.53 M * 0.41 S * 1.04 L

Comparison – Fully vs. Partially vs Not

*Significant differences (p-value<0.05)

Effect sizes b/w Fully & Not are LARGE

18/25

Page 19: Potential Biases in Bug Localization: Do They Matter?

BIAS 2– Localized Bug Reports

Best & Worst bug reportsProject Fully Partially Not p-value

HTTPClientUpper 16 5 4

0.0041*Lower 6 4 15

*Significant differences (p-value<0.05)

JackrabbitUpper 35 9 6

2.807e-13*

Lower 7 1 42

Lucene-JavaUpper 22 18 10

8.724e-05*

Lower 5 18 27

19/25

Page 20: Potential Biases in Bug Localization: Do They Matter?

BIAS 2– Localized Bug Reports

Results:More than 50% of bugs are either fully or partially localizedMAP scores for fully & partially localized much higher than not localizedEffect sizes between fully & not localized are LARGE

20/25

Page 21: Potential Biases in Bug Localization: Do They Matter?

BIAS 3– Non-Buggy Files

Manual Investigation

Randomly selected 100 not localized bug reports

Files changed to fix these bugs

Diff between original & modified file

Non-buggy = Cosmetic changes, refactorings etc.

clean GROUND TRUTH files

21/25

Page 22: Potential Biases in Bug Localization: Do They Matter?

BIAS 3– Non-Buggy Files

Example

22/25

Page 23: Potential Biases in Bug Localization: Do They Matter?

BIAS 3– Non-Buggy Files

Differences are not significantEffect sizes are trivial (<0.2)

Mean Average Precision (MAP) Scores

Projects Dirty Clean Difference dHTTPClient 0.207 0.171 0.036 0.08

Jackrabbit 0.115 0.115 0.000 0.08

Lucene-Java

0.271 0.239 0.032 0.17

23/25

Page 24: Potential Biases in Bug Localization: Do They Matter?

BIAS 3– Non-Buggy Files

Results:28.11% of the files in the ground-truth are non-buggyDifferences between MAP scores are not significantEffect sizes are negligible i.e., <0.2

24/25

Page 25: Potential Biases in Bug Localization: Do They Matter?

Conclusion

BIAS 1- Wrongly classified issue reports NOT statistically significant NO substantial impact

BIAS 2 – Localized bug reports Statistically significant Substantial impact

BIAS 3 – Non-buggy files: NOT statistically significant NO substantial impact

25/25

Page 26: Potential Biases in Bug Localization: Do They Matter?

Thank You!

Email: [email protected]

Page 27: Potential Biases in Bug Localization: Do They Matter?

Other Evaluation Metrics

HIT@N : Percentage of bug reports with at least one buggy file in top N ranked results

Mean Reciprocal Rank (MRR) Reciprocal rank is inverse of the rank of the 1st buggy file. MRR is average of the reciprocal ranks.

Page 28: Potential Biases in Bug Localization: Do They Matter?

BIAS 1- Report Misclassification

Page 29: Potential Biases in Bug Localization: Do They Matter?

BIAS 2- Localized Bug Reports

Page 30: Potential Biases in Bug Localization: Do They Matter?

BIAS 3- Non-Buggy Files

Page 31: Potential Biases in Bug Localization: Do They Matter?

BIAS 1, BIAS 2 & BIAS 3

Mean Reciprocal Rank (MRR) Scores

Page 32: Potential Biases in Bug Localization: Do They Matter?

Appendix (Statistical Analysis)

• Mann-Whitney-Wilcoxon (MWW) test: Given a significance level = 0.05,if p-value <, then the test rejects the null hypothesis.

Page 33: Potential Biases in Bug Localization: Do They Matter?

Appendix (BIAS-2 Results)Actual to Reported HC JB LJ Overall

None 0.429 0.302 0.301 0.312

RFE to BUG 0.427 0.303 0.304 0.313

DOCUMENTATION to BUG 0.430 0.304 0.305 0.315

IMPROVEMENT to BUG 0.416 0.299 0.295 0.307

REFACTORING to BUG 0.428 0.301 0.301 0.311

BACKPORT to BUG 0.430 0.303 0.300 0.313

CLEANUP to BUG 0.429 0.303 0.303 0.314

SPEC to BUG 0.435 0.302 0.301 0.312

TASK to BUG 0.432 0.302 0.301 0.312

TEST to BUG 0.429 0.328 0.313 0.334

BUILD_SYSTEM to BUG 0.429 0.306 0.303 0.315

DESIGN_DEFECT to BUG 0.424 0.301 0.301 0.311

OTHERS to BUG 0.439 0.303 0.301 0.313